Stock Portfolio Analytics¶

Toronto, August, 30 2024
Autor : Atsu Vovor

Master of Management in Artificial Intelligence,
Data Analytics and Reporting Professional | Machine Learning | Data science | Quantitative Analysis |French Bilingua

Abstract¶

This project presents the development of an advanced stock portfolio analytics tool designed to assist portfolio managers in optimizing investment strategies. By leveraging statistical analysis, mathematical and machine learning techniques, the tool provides insights into stock asset pricing, risk assessment, asset allocation, and performance forecasting. The project outlines the methodology used, including data collection and preprocessing, explanatory datanalysis, model selection and evaluation metrics, stress testing under economic key performance indicators scenarios. Results demonstrate the tool's effectiveness in enhancing decision-making processes, potentially leading to improved portfolio performance. The findings highlight the importance of integrating modern analytics into traditional portfolio management to navigate the complexities of today's financial markets.

Introduction¶

The growing complexity of financial instruments and risk factors places significant pressure on portfolio managers, who must navigate and analyze a vast and intricate flow of data each day. Utilizing a robust dataset comprising historical stock prices, economic indicators, and financial metrics, our goal is to develop an advanced stock portfolio analysis tool that leverages advanced statistical methods, portfolio optimization and machine learning techniques to assist portfolio managers in making informed decisions. The tool provides insights into the asset pricing, risk assessment, asset allocation, and performance forecasting.

To achieve this goal, we begin by dynamically collecting real time data of all the S&P/TSX composite constituents adjust closed prices and canadian economic factors. The methodology used involves data preprocessing to ensure accuracy and relevance, followed by exploratory data analysis (EDA) to uncover key trends and correlations. Principal Component Analysis (PCA) is applied to reduce the dimensionality of the dataset, enabling the identification of the most influential factors affecting portfolio performance. We then use correlation analysis and hierarchical clustering to categorize stocks into distinct groups, facilitating diversification and risk management.

Moreover, the project explores advenced assets pricing technics sach as Stochastic Differencial Equation and Monte Carlo Simulation combined with modern portfolio theory (MPT) to simulate the portfolio price, profit & lost, risk and construct efficient portfolios, and stress testing techniques to evaluate portfolio robustness under various economic scenarios. The results demonstrate significant improvements in risk-adjusted returns, providing actionable insights for portfolio managers and investors.

In conclusion, this project underscores the importance of integrating advanced analytics into investment decision-making processes. The findings offer a valuable framework for optimizing stock portfolios, enhancing performance, and managing risk in an increasingly complex financial environment.

Description¶

This white paper presents an in-depth analysis of stock portfolio management through the application of advanced data analytics techniques. The project aims to address the challenges faced by investors in optimizing their portfolios by incorporating a data-driven approach to decision-making. By analyzing historical stock prices, financial indicators, and macroeconomic variables, the project seeks to develop strategies that maximize returns while minimizing risk.

Scope of the Project

The scope of this project includes the following key areas:

1. Data Collection and Preprocessing:

  • The project begins with the collection of a comprehensive dataset that includes historical stock prices, financial ratios, and relevant economic indicators.
  • Data preprocessing steps are undertaken to clean and prepare the data, ensuring accuracy, consistency, and relevance. This includes handling missing data, normalizing variables, and filtering out noise.

2. Exploratory Data Analysis (EDA):

  • EDA is conducted to uncover underlying trends, correlations, and patterns within the data. This step provides insights into the behavior of individual stocks and the market as a whole, laying the foundation for further analysis.
  • Visualization techniques are employed to illustrate key findings and to identify potential opportunities for portfolio optimization.

3. Dimensionality Reduction and Portfolio Construction using Correlation Analysis, Clustering and Principal Component Analysis (PCA)

  • Correlation Analysis, Clustering and Portfolio Construction Hierarchical clustering techniques are applied to group stocks into clusters based on their similarities in performance, risk profile, and other attributes. This clustering facilitates the selection of a diversified set of assets for portfolio construction, ensuring that the portfolio is balanced and less susceptible to market shocks.

  • Principal Component Analysis (PCA) To manage the complexity of the dataset and to focus on the most impactful variables, PCA is utilized to reduce the number of factors considered in the analysis.It helps in identifying the principal components that explain the majority of the variance in the data, enabling the selection of the most relevant indicators for portfolio construction.

  • Statcking PCA,Correlation Analysis and Clustering for Diversified Portfolio Construction stacking Correlation Analysis, Clustering and Principal Component Analysis (PCA) helps to construct a well diversified portfolio

4. Asset Pricing, Profit & Lost simation and Risk calculation*

  • Lognormal of asset returns, Covarariance Matrix Cholesky Decomposition applied to Monte Carlo Simulation for asset pricing and Profit & Lost simulation.
  • Value at Risk(VaR) and Conditional Value at Risk(CVaR) calculation

5. Portfolio Optimization:

  • Modern Portfolio Theory (MPT) is implemented to construct efficient portfolios that optimize the trade-off between risk and return.
  • The optimization process involves determining the boundary random portfolios assets and weights that maximize the portfolio's expected return for a given level of risk or minimize risk for a given level of expected return or a given risk level.
  • Using Monte Carlo simulation to generate Efficient Frontier
  • Machine Learning technics are used to improve the optimization process by modelling the boundary random portfolios assets that maximize the portfolio's expected return for a given level of risk or minimize risk for a given level of expected return or a given risk level.
  • Investment strategies are bult for optimal portfolios(minimal risk portfolio, maximal return portfolio, sharpe ratio (tangent portfolio)

6. Investment Risk Profiles Simulation using K-Means Clustering applied to random portfolio

  • The simulated portfolio risk is combigned with the simulated the portfolio expected return and the predicted expected return to set the randomn efficient frontier data. The randomn efficient frontier data is then used as input for the K-means cluster models to simulate the instment risk profile and investment strategy.

7. Stress Testing and Scenario Analysis:

  • Stress testing is conducted to evaluate the portfolio's performance under different economic scenarios, including adverse market conditions.
  • This analysis provides insights into the portfolio’s resilience and helps in identifying potential vulnerabilities.

Tools and Technologies

The project leverages various tools and technologies, including:

  • Python: For data analysis, statistical modeling, and machine learning.
  • Pandas and NumPy: For data manipulation and numerical computations.
  • Matplotlib and Seaborn: For data visualization.
  • Scikit-learn: For machine learning, PCA, and clustering.
  • Optimization Libraries: For portfolio optimization using MPT.
  • Financial Databases: To source historical data, including stock prices and economic indicators.

Key Outcomes The project yields several key outcomes:

  • Identification of the most influential economic indicators and stock characteristics for portfolio management.
  • Creation of optimized portfolios that demonstrate improved risk-adjusted returns.
  • Insights into portfolio performance under various market conditions, aiding in risk management and strategic planning.
In [1]:
#pip install stats-can
#conda pip install -c districtdatalabs yellowbrick on Anaconda Prompt
#conda install conda=24.5.0
#conda install conda-forge::stats_can 

Import Libraries¶

In [2]:
import yfinance as yf
import pandas as pd
from datetime import date, timedelta
import seaborn as sns 
import numpy as np
import matplotlib.pyplot as plt 
from sklearn.preprocessing import LabelEncoder
from scipy.stats import norm, lognorm, exponnorm, logistic, erlang,gennorm 
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
#from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
#from yellowbrick.cluster import KElbowVisualizer
from scipy.optimize import curve_fit
import random
from statistics import NormalDist
from scipy import stats
from fitter import Fitter, get_common_distributions, get_distributions 
import matplotlib.transforms as transforms
from matplotlib.table import table
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
import warnings
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.model_selection import train_test_split, cross_val_score
from tabulate import tabulate

#from stats_can import StatsCan as sc
from stats_can import StatsCan
sc = StatsCan()

#import pandas_datareader.data as web

1. Index Contents Data Collection and Preprocessing¶

In this section, we will read all the S&P/TSX composite constituents table from wikipedia(https://en.wikipedia.org/wiki/S%26P/TSX_Composite_Index). Then we will get the tickers adjusted close prices from Yahoo Finance using yfinance library. We will clean the data by removing all the empty rows and columns. With more than 200 remaining tickers, we will calculate the assets log return and we will remove all the assets with negative expected return. We will couple Correlation Analysis with Principle Component Analysis to reduce the volume of assets and keep only most important assets. The Correlation Analysis will be used to identify and remove redundant assets. The end result will be a well diversify portfolio.

In [3]:
#--------------------------------------------- 1. Index Contents Data Collection and Preprocessing ------------------------------------------------
#read the index content from wikipedia and return the index content data frame
def read_index_content(content_html,web_tab_number):
    S_and_P_TSX_Composite = pd.read_html(content_html)[web_tab_number] 
    index_content_df = S_and_P_TSX_Composite[['Ticker','Company','Sector [10]','Industry [10]']]
    index_content_df = index_content_df.rename(columns={"Sector [10]": "Sector", "Industry [10]": "Industry"})
    return index_content_df
    #return S_and_P_TSX_Composite[['Ticker','Company','Sector [10]','Industry [10]']].head()
    
    
#extract the index tickers 
def generate_ticker_df(index_content_df):
    index_content_tickers_list = index_content_df['Ticker']
    index_content_tickers_list = index_content_tickers_list.tolist()
    new_index_content_tickers_list = []
    for item in index_content_tickers_list:
        new_index_content_tickers_list.append(str(item))
    return new_index_content_tickers_list

  
#--------------------------------------------------------------------------------------------------------------------
#Description:Extract adj close price for each stock on the index from Yahoo Finance web site and clean the data 
#Input:start date, end date, index ticker list
#Return the index Adj close price data frame
#-----------------------------------------------------------------------------------------------------------------------
def start_date(reporting_year_period = 365*5):
    return pd.Timestamp.today() - pd.Timedelta(days = reporting_year_period)

def create_adj_close_price_df(reporting_year_period, content_ticker_list):
                          
    #frequency = frequency_date_column[0].upper()
    #selected_asset_list = get_selected_assets_list(log_returns,correlation_coefficient_treshold)
    start_date = reporting_year_period
    end_date = date.today()
    selected_assets_yahoo_adj_close_price_data = yf.download(content_ticker_list, start_date, end_date, ['Adj Close'], period ='max')
    selected_assets_adj_close_price_df = selected_assets_yahoo_adj_close_price_data['Adj Close']
    index_adj_close_price_df = selected_assets_adj_close_price_df.dropna(axis=1)
    return index_adj_close_price_df

def asset_daily_price(price_df,number_of_asset):
    print('\nPlotting the first 5 assets daily adj closed prices\n')
    price_df.iloc[:,:number_of_asset].plot(figsize=(15,6)) 
    plt.show()
    
def plot_assets_distribution(df,xlabel, ylabel, title=''):
    # Define the number of assets
    n_assets = df.shape[1]

    # Create subplots
    fig, axes = plt.subplots(1, n_assets, figsize=(23,  3))
   
    if n_assets == 1:
        axes = [axes]

    # Iterate over each asset
    for i, asset in enumerate(df.columns):
        g =sns.histplot(df[asset], kde=True, ax=axes[i])
        axes[i].set_title(f'{title + asset}')
        axes[i].set_xlabel(xlabel)
        axes[i].set_ylabel(ylabel)
        
        # Calculate and display statistics
        mean_return = df[asset].mean()
        std_dev = df[asset].std()
        skewness = df[asset].skew()
        kurtosis = df[asset].kurtosis()

         # Add statistics below the plot
        statistics = (f"Mean: {mean_return:.4f}\n"
                 f"Std Dev: {std_dev:.4f}\n"
                  f"Skewness: {skewness:.4f}\n"
                 f"Kurtosis: {kurtosis:.4f}")
    
        # Place the text under the plot
        axes[i].text(0.3, -0.3, statistics, transform=axes[i].transAxes, 
                fontsize=10, verticalalignment='top', bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="lightgrey"))

# Adjust layout
plt.tight_layout()
plt.show()

    
def normalize_asset_daily_price(price_df,number_of_asset):

    normalized_asset_daily_price_df = price_df.iloc[:,:number_of_asset]
    normalized_asset_daily_price_df = (normalized_asset_daily_price_df / normalized_asset_daily_price_df.iloc[0])*100
    normalized_asset_cols_size = len(normalized_asset_daily_price_df.columns)
    normalized_asset_daily_price_df.plot(figsize = (15, 6))
    plt.show()
    plot_assets_distribution(normalized_asset_daily_price_df, 'Adjusted Close Price','Frequency')
      
<Figure size 640x480 with 0 Axes>
In [4]:
print('\nData collection and preprocessing\n')
index_content_df = read_index_content('https://en.wikipedia.org/wiki/S%26P/TSX_Composite_Index',3)
content_ticker_list = generate_ticker_df(index_content_df)
index_adj_close_price_df = create_adj_close_price_df( start_date(365*5), content_ticker_list )
print('\nList of companies\n')
display(index_content_df)
print('\nAdjusted Close Price Data Frame\n')
display(index_adj_close_price_df)
print('\nData structure\n')
index_adj_close_price_df.info()
print('\nData statics summary\n')
display(index_adj_close_price_df.describe().transpose())
Data collection and preprocessing

[*********************100%%**********************]  225 of 225 completed
105 Failed downloads:
['CU', 'CCL.B', 'IVN', 'IFP', 'TIH', 'DFY', 'INE', 'GEI', 'POU', 'DSG', 'KEL', 'ABX', 'TOU', 'RUS', 'AOI', 'GWO', 'WSP', 'MRU', 'WTE', 'FRU', 'KXS', 'REI.UN', 'RCH', 'EMA', 'ARX', 'ATD', 'WPK', 'WN', 'CJT', 'FIL', 'NWC', 'CPX', 'MTY', 'LUG', 'AAV', 'IFC', 'LNR', 'BBD.B', 'SRU.UN', 'EQB', 'IMG', 'CFP', 'BIR', 'EFN', 'FFH', 'TOY', 'SIA', 'LUN', 'OLA', 'NPI', 'EIF', 'DML', 'FVI', 'KNT', 'WDO', 'WCP', 'ALA', 'MATR']: Exception('%ticker%: No price data found, symbol may be delisted (1d 2019-09-04 04:00:29.864463 -> 2024-09-02)')
['HWX']: Exception("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")
['CHP.UN', 'CNR', 'TCL.A', 'ATH', 'ONEX', 'ACO.X', 'PKI', 'BDGI', 'AP.UN', 'ATRL', 'FTT', 'BEI.UN', 'TSU', 'IPCO', 'RCI.B', 'CRR.UN', 'CSH.UN', 'CCA', 'MTL', 'CSU', 'POW', 'CAR.UN', 'NWH.UN', 'CS', 'CRT.UN', 'GRT.UN', 'CTC.A', 'BBU.UN', 'PMZ.UN', 'BEP.UN', 'TA', 'IIP.UN', 'DPM', 'QBR.B', 'TECK.B', 'KMP.UN', 'EMP.A', 'FCR.UN', 'DIR.UN', 'BIP.UN', 'HR.UN', 'GIB.A']: Exception('%ticker%: No timezone found, symbol may be delisted')
['CPG', 'ENGH', 'TCN', 'ERF']: Exception('%ticker%: No data found, symbol may be delisted')

List of companies

Ticker Company Sector Industry
0 AAV Advantage Energy Ltd. Energy Oil & Gas Exploration and Production
1 AOI Africa Oil Corp. Energy Oil & Gas Exploration and Production
2 AEM Agnico Eagle Mines Limited Basic Materials Metals & Mining
3 AC Air Canada Industrials Transportation
4 AGI Alamos Gold Inc. Basic Materials Metals & Mining
... ... ... ... ...
220 WTE Westshore Terminals Investment Corporation Industrials Transportation
221 WPM Wheaton Precious Metals Corp. Basic Materials Metals & Mining
222 WCP Whitecap Resources Inc. Energy Oil & Gas Exploration and Production
223 WPK Winpak Ltd. Consumer Cyclical Packaging & Containers
224 WSP WSP Global Inc. Industrials Construction

225 rows × 4 columns

Adjusted Close Price Data Frame

AC AEM AGI AQN ATS BB BCE BHC BLDP BLX ... TLRY TPZ TRI TRP TVE VET WCN WFG WPM X
Date
2019-09-04 33.602268 56.552120 6.899674 10.177608 13.890000 6.91 35.615284 21.330000 4.63 12.608221 ... 30.000000 11.919843 61.598331 38.404881 22.843452 12.524200 88.492218 32.459225 28.881174 10.907244
2019-09-05 33.263344 54.074928 6.595137 10.124326 13.890000 7.28 35.652538 21.719999 4.53 13.009621 ... 32.080002 11.867153 62.446735 37.961239 22.790571 12.937088 88.424744 32.262451 28.131020 11.236592
2019-09-06 33.505432 52.435226 6.328667 10.124326 13.890000 7.19 35.898422 22.180000 4.61 13.109968 ... 32.060001 11.847400 62.552784 37.754208 22.737694 13.074717 88.578979 32.262451 27.005779 11.033171
2019-09-09 33.795937 50.874863 6.166883 10.093872 14.020000 6.98 36.039982 22.090000 5.10 13.425352 ... 30.150000 11.959357 61.209476 37.842941 22.587868 13.539213 86.583580 33.265079 26.358767 11.846856
2019-09-10 34.357590 49.975670 6.157364 10.032975 14.020000 7.15 35.958023 22.490000 5.00 13.547209 ... 31.129999 11.919843 60.051735 37.717232 22.587868 13.737055 86.072670 35.082947 26.143095 12.030904
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 33.250000 81.918884 19.570000 5.410000 27.180000 2.36 35.139999 5.970000 1.90 30.570000 ... 1.860000 18.000000 167.179993 45.439999 22.450001 10.350000 186.460007 90.309998 62.299999 37.820000
2024-08-27 32.150002 81.869118 19.490000 5.350000 27.059999 2.32 35.220001 5.960000 1.87 30.959999 ... 1.750000 18.070000 170.919998 45.680000 22.490000 10.170000 186.009995 88.959999 62.430000 37.980000
2024-08-28 32.720001 80.814285 19.110001 5.290000 26.740000 2.32 35.009998 5.900000 1.81 31.010000 ... 1.700000 17.850000 170.320007 45.450001 22.510000 10.140000 185.419998 88.370003 61.389999 37.389999
2024-08-29 33.320000 81.689995 19.170000 5.360000 26.730000 2.35 34.889999 5.870000 1.87 31.230000 ... 1.700000 18.250000 169.570007 45.750000 22.500000 10.310000 185.770004 88.930000 61.660000 38.560001
2024-08-30 33.330002 81.470001 19.280001 5.410000 26.850000 2.35 35.000000 5.930000 1.84 31.350000 ... 1.710000 18.350000 171.179993 46.340000 22.469999 10.280000 186.500000 88.519997 61.810001 37.910000

1257 rows × 98 columns

Data structure

<class 'pandas.core.frame.DataFrame'>
Index: 1257 entries, 2019-09-04 00:00:00 to 2024-08-30 00:00:00
Data columns (total 98 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AC      1257 non-null   float64
 1   AEM     1257 non-null   float64
 2   AGI     1257 non-null   float64
 3   AQN     1257 non-null   float64
 4   ATS     1257 non-null   float64
 5   BB      1257 non-null   float64
 6   BCE     1257 non-null   float64
 7   BHC     1257 non-null   float64
 8   BLDP    1257 non-null   float64
 9   BLX     1257 non-null   float64
 10  BMO     1257 non-null   float64
 11  BN      1257 non-null   float64
 12  BNS     1257 non-null   float64
 13  BTE     1257 non-null   float64
 14  BTO     1257 non-null   float64
 15  BYD     1257 non-null   float64
 16  CAE     1257 non-null   float64
 17  CCO     1257 non-null   float64
 18  CG      1257 non-null   float64
 19  CIGI    1257 non-null   float64
 20  CIX     1257 non-null   float64
 21  CLS     1257 non-null   float64
 22  CM      1257 non-null   float64
 23  CNQ     1257 non-null   float64
 24  CP      1257 non-null   float64
 25  CVE     1257 non-null   float64
 26  CWB     1257 non-null   float64
 27  DOL     1257 non-null   float64
 28  DOO     1257 non-null   float64
 29  EFR     1257 non-null   float64
 30  ELD     1257 non-null   float64
 31  ENB     1257 non-null   float64
 32  EQX     1257 non-null   float64
 33  ERO     1257 non-null   float64
 34  FM      1257 non-null   float64
 35  FNV     1257 non-null   float64
 36  FR      1257 non-null   float64
 37  FSV     1257 non-null   float64
 38  FTS     1257 non-null   float64
 39  GIL     1257 non-null   float64
 40  GOOS    1257 non-null   float64
 41  GSY     1257 non-null   float64
 42  H       1257 non-null   float64
 43  HBM     1257 non-null   float64
 44  IAG     1257 non-null   float64
 45  IGM     1257 non-null   float64
 46  IMO     1257 non-null   float64
 47  K       1257 non-null   float64
 48  KEY     1257 non-null   float64
 49  L       1257 non-null   float64
 50  LAAC    1257 non-null   float64
 51  MAG     1257 non-null   float64
 52  MFC     1257 non-null   float64
 53  MG      1257 non-null   float64
 54  MX      1257 non-null   float64
 55  NAN     1257 non-null   float64
 56  NG      1257 non-null   float64
 57  NGD     1257 non-null   float64
 58  NTR     1257 non-null   float64
 59  NXE     1257 non-null   float64
 60  OGC     1257 non-null   float64
 61  OR      1257 non-null   float64
 62  OSK     1257 non-null   float64
 63  OTEX    1257 non-null   float64
 64  PAAS    1257 non-null   float64
 65  PBH     1257 non-null   float64
 66  PD      1257 non-null   float64
 67  PEY     1257 non-null   float64
 68  PPL     1257 non-null   float64
 69  PRMW    1257 non-null   float64
 70  PSI     1257 non-null   float64
 71  PSK     1257 non-null   float64
 72  QSR     1257 non-null   float64
 73  RY      1257 non-null   float64
 74  SAP     1257 non-null   float64
 75  SHOP    1257 non-null   float64
 76  SII     1257 non-null   float64
 77  SIL     1257 non-null   float64
 78  SJ      1257 non-null   float64
 79  SLF     1257 non-null   float64
 80  SPB     1257 non-null   float64
 81  SSL     1257 non-null   float64
 82  SSRM    1257 non-null   float64
 83  STN     1257 non-null   float64
 84  SU      1257 non-null   float64
 85  T       1257 non-null   float64
 86  TD      1257 non-null   float64
 87  TFII    1257 non-null   float64
 88  TLRY    1257 non-null   float64
 89  TPZ     1257 non-null   float64
 90  TRI     1257 non-null   float64
 91  TRP     1257 non-null   float64
 92  TVE     1257 non-null   float64
 93  VET     1257 non-null   float64
 94  WCN     1257 non-null   float64
 95  WFG     1257 non-null   float64
 96  WPM     1257 non-null   float64
 97  X       1257 non-null   float64
dtypes: float64(98)
memory usage: 972.2+ KB

Data statics summary

count mean std min 25% 50% 75% max
AC 1257.0 36.512448 3.092432 25.417492 34.379246 36.194168 38.320412 61.728195
AEM 1257.0 53.623814 9.305847 32.276482 47.323544 52.089771 58.276974 82.386589
AGI 1257.0 9.447117 3.169251 3.714411 7.307484 8.300611 11.715064 19.910000
AQN 1257.0 9.912172 2.794989 4.757608 6.779181 10.848129 12.408038 14.397228
ATS 1257.0 28.857884 10.614868 10.000000 16.709999 31.440001 38.150002 48.730000
... ... ... ... ... ... ... ... ...
VET 1257.0 11.536508 5.510147 1.587390 7.014824 11.788893 14.338865 28.071140
WCN 1257.0 123.783909 25.419136 69.163414 99.404480 127.301590 137.661545 186.500000
WFG 1257.0 67.408003 18.936303 14.714417 55.634064 73.254677 80.781425 98.821220
WPM 1257.0 40.842845 7.997292 22.332952 37.146351 41.381927 45.544857 62.430000
X 1257.0 23.187595 10.819288 4.768804 14.710779 23.290730 29.508034 49.411922

98 rows × 8 columns

2. Exploratory Data Analysis (EDA)¶

In [5]:
def plot_assets_distribution(df,xlabel, ylabel, title=''):
    # Define the number of assets
    n_assets = df.shape[1]

    # Create subplots
    fig, axes = plt.subplots(1, n_assets, figsize=(23,  3))
   
    if n_assets == 1:
        axes = [axes]

    # Iterate over each asset
    for i, asset in enumerate(df.columns):
        g =sns.histplot(df[asset], kde=True, ax=axes[i])
        axes[i].set_title(f'{title + asset}')
        axes[i].set_xlabel(xlabel)
        axes[i].set_ylabel(ylabel)
        
        # Calculate and display statistics
        mean_return = df[asset].mean()
        std_dev = df[asset].std()
        skewness = df[asset].skew()
        kurtosis = df[asset].kurtosis()

         # Add statistics below the plot
        statistics = (f"Mean: {mean_return:.4f}\n"
                 f"Std Dev: {std_dev:.4f}\n"
                  f"Skewness: {skewness:.4f}\n"
                 f"Kurtosis: {kurtosis:.4f}")
    
        # Place the text under the plot
        axes[i].text(0.3, -0.3, statistics, transform=axes[i].transAxes, 
                fontsize=10, verticalalignment='top', bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="lightgrey"))

# Adjust layout
plt.tight_layout()
plt.show()

    
def normalize_asset_daily_price(price_df,number_of_asset):

    normalized_asset_daily_price_df = price_df.iloc[:,:number_of_asset]
    normalized_asset_daily_price_df = (normalized_asset_daily_price_df / normalized_asset_daily_price_df.iloc[0])*100
    normalized_asset_cols_size = len(normalized_asset_daily_price_df.columns)
    normalized_asset_daily_price_df.plot(figsize = (15, 6))
    plt.show()
    plot_assets_distribution(normalized_asset_daily_price_df, 'Adjusted Close Price','Frequency')
<Figure size 640x480 with 0 Axes>
In [6]:
print('\nExploratory Data Analysis (EDA)\n')
number_of_asset =5
index_adj_close_price_df.iloc[0] # first row
asset_daily_price(index_adj_close_price_df,number_of_asset) #Plotting the first 5 assets daily adj closed prices
normalize_asset_daily_price(index_adj_close_price_df,number_of_asset)  #Normalization of adj closed prices to 100
Exploratory Data Analysis (EDA)


Plotting the first 5 assets daily adj closed prices

Assets log return Volatility Calculation¶

In this section, we will calculate the assets log return instead of arithmetic return. The arithmetic return is the percentage change in the asset's price from one period to the next where as the log return of an asset over a period is calculated as the natural logarithm of the ratio of the ending price to the starting price.

$$ Arithmetic Return: R = \frac{P_t - P_{t-1}}{P_{t-1}} $$

. $$ Log Return: \text{Log Return} = \ln\left(\frac{P_t}{P_{t-1}}\right) $$.

Throughout this project, we will use asset log returns instead of arithmetic returns, simply because, in the upcoming sections we will perform stochastic simulation of the stock prices to calculate Profit & Lost, VaR, CVaR and stress testing. Log returns are commonly used in the financial literature to perform financial modeling like asset prices modeling over time, as prices cannot be negative but can increase indefinitely. Log returns are normally distributed with Fat-Tailed that make them more likely to predict extreme returns than assuming arithmetic returns to be normally distributed. As we know, stocks are traded with very high frequency over very short period of time and the form of their distributions are unknown as we can see in the plottings above. This leads to use log returns witch naturally account for continuous compounding and more accurate instead of arithmetic returns witch are based on simple interest. Furthermore, as opposed to arithmetic returns, log returns are additive meaning that you can add log returns over multiple periods to get the total log return.

In [7]:
def calculate_stock_price_log_return(index_adj_close_price_df):
    log_returns = np.log(index_adj_close_price_df / index_adj_close_price_df.shift(1))
    log_returns = log_returns.dropna(how = 'all')
    return log_returns

#removing asset with negative expected return
def removing_assets_with_negative_expected_return(log_returns,threshold):
    # Calculate the correlation matrix
    #corr_matrix = expected_returns.corr()
    # Create a list to store uncorrelated assets
    assets_with_positive_expected_return = []
    # Iterate through the correlation matrix
    for asset in log_returns.columns:
        # Check if the asset is uncorrelated with all other assets
        #for other_assets in corr_matrix.columns:
            if log_returns.mean()[asset] > threshold:
                assets_with_positive_expected_return.append(asset) 
    assets_with_positive_expected_return_list = list(dict.fromkeys(assets_with_positive_expected_return))
    return assets_with_positive_expected_return_list


def positive_assets_log_returns_df(log_returns_df, positive_assets_list):
    return log_returns_df[positive_assets_list]

def stocks_initial_price(positive_assets_list):  
    return index_adj_close_price_df.iloc[0][positive_assets_list]

def generate_asset_volatility(frequency_date_column, log_return_df):
    #selected_content_ticker_list = get_selected_assets_list(log_returns,correlation_coefficient_treshold)
                          
    frequency = frequency_date_column[0].upper()
   

    assets_volatility_df = log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(assets_volatility_df.columns):
        assets_volatility_df = assets_volatility_df.rename(columns={col: col+' Volatility'})
    
    assets_volatility_df = assets_volatility_df.dropna(axis=0)
    
    assets_volatility_df[frequency_date_column] = pd.to_datetime(assets_volatility_df.index, format = '%m/%Y')
    assets_volatility_df[frequency_date_column] = assets_volatility_df[frequency_date_column].dt.to_period(frequency)
        
    assets_volatility_df.set_index(frequency_date_column, inplace=True)
    assets_volatilities = assets_volatility_df.groupby(frequency_date_column).mean()
    assets_volatilities = round(assets_volatilities,1)
    assets_volatilities = assets_volatilities.dropna(axis=0)
    return assets_volatilities
    
    
def portfolio_arihtmetics(log_returns,stocks_initial_prices):
    return pd.DataFrame({'mu expected_return':log_returns.mean(),
                         'variance':log_returns.var(),
                         'Sigmas(volatilities)':log_returns.std(),
                         'modifiy shape(Er)/𝝈':log_returns.mean()/log_returns.std(),
                         'initial price': stocks_initial_prices}).transpose()


stock_price_log_return = calculate_stock_price_log_return(index_adj_close_price_df)
log_returns = positive_assets_log_returns_df(stock_price_log_return, 
                                             removing_assets_with_negative_expected_return(stock_price_log_return,0))
asset_volatility_df = generate_asset_volatility('Quater', log_returns)
positive_assets_list = removing_assets_with_negative_expected_return(stock_price_log_return,0)
stocks_initial_prices = stocks_initial_price(positive_assets_list)
portfolio_arihtmetics_df = portfolio_arihtmetics(log_returns,stocks_initial_prices).transpose()
In [8]:
print('\nAssets log return data frame\n')
display(log_returns)
print('\nAssets volatility data frame\n')
display(asset_volatility_df)
print('\nPortfolio arithmetics\n')
display(portfolio_arihtmetics_df)
plot_assets_distribution(log_returns.iloc[:,:number_of_asset], 'log_returns','Frequency')
plot_assets_distribution(asset_volatility_df.iloc[:,:number_of_asset], 'Volatility','Frequency')
Assets log return data frame

AEM AGI ATS BLX BMO BN BNS BTE BTO BYD ... T TD TFII TPZ TRI TRP WCN WFG WPM X
Date
2019-09-05 -0.044792 -0.045142 0.000000 0.031340 0.016443 0.012316 0.007420 0.030772 0.028079 0.023600 ... 0.004748 0.009185 0.000000 -0.004430 0.013679 -0.011619 -0.000763 -0.006081 -0.026317 0.029748
2019-09-06 -0.030792 -0.041243 0.000000 0.007684 0.008908 0.007131 0.010479 0.029853 -0.002579 -0.004174 ... 0.009981 0.010911 0.041314 -0.001666 0.001697 -0.005469 0.001743 0.000000 -0.040822 -0.018269
2019-09-09 -0.030210 -0.025896 0.009316 0.023772 0.013638 -0.008073 0.008377 0.043172 0.027067 0.042978 ... 0.014787 0.005411 0.000000 0.009406 -0.021709 0.002348 -0.022784 0.030604 -0.024250 0.071156
2019-09-10 -0.017833 -0.001545 0.000000 0.009036 0.018039 -0.011756 0.005426 0.000000 0.018675 0.024145 ... 0.021246 0.011802 0.000000 -0.003310 -0.019096 -0.003327 -0.005918 0.053207 -0.008216 0.015416
2019-09-11 0.009306 0.000000 0.000000 0.039424 0.002630 0.007411 0.007547 -0.007067 0.011954 0.035725 ... 0.030401 0.001599 0.022345 -0.003875 -0.017369 -0.018200 -0.003029 0.005594 0.006792 0.063179
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 -0.005693 -0.002552 0.001841 0.009201 0.005535 0.005483 0.003505 0.019152 0.009959 -0.001834 ... 0.001519 -0.002868 0.009018 0.005571 0.005638 0.008176 0.000161 0.006889 0.003054 0.018144
2024-08-27 -0.000608 -0.004096 -0.004425 0.012677 -0.063600 0.008469 0.027805 -0.019152 -0.011477 0.000000 ... -0.005582 0.008581 -0.015835 0.003881 0.022125 0.005268 -0.002416 -0.015061 0.002085 0.004222
2024-08-28 -0.012968 -0.019690 -0.011896 0.001614 -0.017194 -0.006649 -0.021241 -0.011111 0.009674 -0.008379 ... 0.008614 -0.004365 -0.003695 -0.012250 -0.003517 -0.005048 -0.003177 -0.006654 -0.016799 -0.015656
2024-08-29 0.010778 0.003135 -0.000374 0.007069 0.013104 0.004437 0.006318 0.024829 0.002104 0.008546 ... -0.003032 0.000336 0.001883 0.022162 -0.004413 0.006579 0.001886 0.006317 0.004388 0.030812
2024-08-30 -0.002697 0.005722 0.004479 0.003835 0.007924 0.011804 0.013320 -0.030431 0.004194 0.001501 ... 0.007060 0.007875 -0.004782 0.005465 0.009450 0.012814 0.003922 -0.004621 0.002430 -0.017001

1256 rows × 76 columns

Assets volatility data frame

AEM Volatility AGI Volatility ATS Volatility BLX Volatility BMO Volatility BN Volatility BNS Volatility BTE Volatility BTO Volatility BYD Volatility ... T Volatility TD Volatility TFII Volatility TPZ Volatility TRI Volatility TRP Volatility WCN Volatility WFG Volatility WPM Volatility X Volatility
Quater
2020Q3 0.5 0.7 0.4 0.7 0.5 0.5 0.4 1.1 0.8 0.9 ... 0.3 0.4 0.5 0.8 0.3 0.5 0.3 0.7 0.5 0.8
2020Q4 0.5 0.7 0.4 0.7 0.5 0.5 0.5 1.1 0.8 0.9 ... 0.4 0.4 0.5 0.8 0.3 0.5 0.3 0.7 0.5 0.8
2021Q1 0.5 0.7 0.4 0.6 0.5 0.5 0.4 1.1 0.8 0.9 ... 0.3 0.4 0.5 0.7 0.3 0.5 0.3 0.7 0.5 0.8
2021Q2 0.4 0.5 0.4 0.4 0.3 0.3 0.2 0.8 0.4 0.5 ... 0.2 0.2 0.4 0.3 0.2 0.3 0.2 0.5 0.4 0.8
2021Q3 0.4 0.5 0.4 0.3 0.2 0.3 0.2 0.7 0.3 0.4 ... 0.2 0.2 0.4 0.2 0.2 0.2 0.1 0.4 0.4 0.7
2021Q4 0.3 0.4 0.4 0.2 0.2 0.3 0.2 0.7 0.3 0.4 ... 0.2 0.2 0.4 0.2 0.2 0.2 0.1 0.4 0.4 0.7
2022Q1 0.3 0.4 0.3 0.2 0.2 0.3 0.2 0.6 0.3 0.4 ... 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.4 0.3 0.6
2022Q2 0.4 0.4 0.4 0.2 0.2 0.3 0.2 0.6 0.4 0.4 ... 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.4 0.3 0.5
2022Q3 0.4 0.4 0.4 0.2 0.2 0.3 0.2 0.7 0.4 0.4 ... 0.3 0.2 0.4 0.2 0.2 0.2 0.2 0.4 0.3 0.5
2022Q4 0.4 0.5 0.5 0.3 0.3 0.3 0.2 0.7 0.4 0.4 ... 0.3 0.2 0.4 0.2 0.2 0.3 0.2 0.5 0.4 0.6
2023Q1 0.4 0.4 0.5 0.3 0.3 0.4 0.2 0.7 0.4 0.4 ... 0.3 0.2 0.4 0.2 0.2 0.3 0.2 0.5 0.4 0.5
2023Q2 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.6 0.4 0.3 ... 0.3 0.2 0.4 0.2 0.2 0.3 0.2 0.4 0.4 0.5
2023Q3 0.4 0.4 0.4 0.3 0.2 0.3 0.2 0.5 0.4 0.3 ... 0.3 0.2 0.4 0.2 0.2 0.3 0.2 0.3 0.3 0.5
2023Q4 0.3 0.3 0.3 0.3 0.2 0.3 0.2 0.5 0.4 0.3 ... 0.3 0.2 0.3 0.1 0.2 0.3 0.2 0.3 0.3 0.5
2024Q1 0.3 0.3 0.3 0.3 0.2 0.3 0.2 0.4 0.4 0.3 ... 0.3 0.2 0.3 0.1 0.2 0.2 0.2 0.3 0.3 0.5
2024Q2 0.3 0.3 0.3 0.3 0.2 0.3 0.2 0.4 0.3 0.3 ... 0.2 0.2 0.3 0.1 0.2 0.2 0.2 0.3 0.3 0.5
2024Q3 0.3 0.3 0.3 0.3 0.2 0.3 0.2 0.4 0.3 0.3 ... 0.2 0.2 0.3 0.1 0.2 0.2 0.2 0.3 0.3 0.4

17 rows × 76 columns

Portfolio arithmetics

mu expected_return variance Sigmas(volatilities) modifiy shape(Er)/𝝈 initial price
AEM 0.000291 0.000604 0.024567 0.011831 56.552120
AGI 0.000818 0.000922 0.030356 0.026952 6.899674
ATS 0.000525 0.000568 0.023823 0.022028 13.890000
BLX 0.000725 0.000564 0.023755 0.030529 12.608221
BMO 0.000341 0.000350 0.018697 0.018251 54.471703
... ... ... ... ... ...
TRP 0.000150 0.000363 0.019042 0.007853 38.404881
WCN 0.000594 0.000192 0.013856 0.042837 88.492218
WFG 0.000799 0.000809 0.028440 0.028086 32.459225
WPM 0.000606 0.000530 0.023014 0.026322 28.881174
X 0.000992 0.001458 0.038187 0.025974 10.907244

76 rows × 5 columns

3. Dimensionality Reduction & Portfolio Construction using Correlation Analysis, Clustering and PCA¶

In this section, we will stack Correlation Analysis and the Principal Component Analysis (PCA) to create a diversified portfolio containing only the most important assets with less correlation. The Principal Component Analysis (PCA) is a dimensionality reduction technique aimed at reducing the number of assets. The PCA process will take the log returns of the assets as input and will produce a correlation matrix as output by transforming the original set of assets into a smaller set of uncorrelated variables called principal components. These components capture the majority of the variance in the data. The correlation analysis process will use the correlation matrix produced by PCA and will analyze the correlation between the most important assets selected by PCA. the highly correlated assets that may be redundant will be dropped. The remaining assets are expected to maintain a well-diversified portfolio.

Correlation Analysis¶

In [9]:
def generate_correlation_matrix(log_returns):
    return log_returns.corr(method='pearson')

def uncorrelated_assets_returns_log_returns_df(log_returns_df, uncorrelated_assets_list):
    return log_returns_df[uncorrelated_assets_list]

def selecting_important_assets_treshold_covariance_method(df,correlation_coefficient_treshold):
    
    return df[(df < correlation_coefficient_treshold).any(axis=1)].index.to_list()

def get_selected_assets_list(log_returns, correlation_coefficient_treshold):
    corr_mat = generate_correlation_matrix(log_returns)
    return selecting_important_assets_treshold_covariance_method(corr_mat,correlation_coefficient_treshold)


def get_selected_assets_log_return( frequency_date_column, log_returns, correlation_coefficient_treshold):
    frequency = frequency_date_column[0].upper()
    #selected_asset_list = selecting_uncorrelated_assets(log_returns,threshold)
    selected_asset_list = get_selected_assets_list(log_returns, correlation_coefficient_treshold)
    #display(selected_asset_list)
    selected_assets_log_returns_df = uncorrelated_assets_returns_log_returns_df(log_returns, selected_asset_list)
        
    for col in range(len(selected_asset_list)):
        selected_assets_log_returns_df = selected_assets_log_returns_df.rename(columns={selected_assets_log_returns_df.columns[col]: 
                                                                                                                selected_asset_list[col]+' Log return'})
    
    selected_assets_log_returns_df[frequency_date_column] = pd.to_datetime(selected_assets_log_returns_df.index, format = '%m/%Y')
    selected_assets_log_returns_df[frequency_date_column] = selected_assets_log_returns_df[frequency_date_column].dt.to_period(frequency)
        
    selected_assets_log_returns_frequency_df = selected_assets_log_returns_df
    selected_assets_log_returns_frequency_df.set_index(frequency_date_column, inplace=True)
    selected_assets_log_returns = selected_assets_log_returns_frequency_df.groupby(frequency_date_column).mean()   
        
    return selected_assets_log_returns

def selected_assets_log_return_var_covar_mat( frequency_date_column, log_returns, correlation_coefficient_treshold):
    selected_assets_log_return_df  = get_selected_assets_log_return( frequency_date_column, log_returns, correlation_coefficient_treshold)
  
    return generate_correlation_matrix(selected_assets_log_return_df)

def get_selected_assets_corr_mat_clustermap( frequency_date_column, log_returns, correlation_coefficient_treshold):
   
    selected_assets_log_return_df = get_selected_assets_log_return(frequency_date_column, log_returns, 
                                                                   correlation_coefficient_treshold)
   
    g = sns.clustermap(selected_assets_log_return_df.corr(),  method = 'complete', cmap   = 'RdBu',  annot  = True,  annot_kws = {'size': 8})
    plt.setp(g.ax_heatmap.get_xticklabels(), rotation=60)   
    

def get_selected_assets_volatility(assets_volatility_df, selected_content_ticker_list):
    
    for col in list(assets_volatility_df.columns):
        assets_volatility_df = assets_volatility_df.rename(columns={col: col.replace(' Volatility', '')})
        
    return assets_volatility_df[selected_content_ticker_list]


#---------------------------------------------- MAIN FUNCTION ---------------------------------------------------------------------------

def portfolio_diversification_and_assets_volatility_Corr_Analysis_main_function():
    #cumulative_variance_treshold = 1.0
    #threshold_for_highest_loadings = 0.5
    correlation_coefficient_treshold = 0.045
    
    selected_assets_list = get_selected_assets_list(log_returns,correlation_coefficient_treshold)
    

    print('\n                           ************************************************************\n'+
             '                                            All the Initial Assets Log Returns \n'+
              '                           ************************************************************\n')
    display(log_returns)
    
    print('\n                   ***************************************************************************\n'+
             '                       Diversified Portfolio Assets Log Returns- Correlation Analysis Methode \n'+
              '                   ***************************************************************************\n')
    selected_assets_log_return_df = get_selected_assets_log_return( 'quarter', log_returns, correlation_coefficient_treshold)        
    display(selected_assets_log_return_df)
    
    print('\n                  *************************************************************************\n'+
             '                  Diversified Portfolio Assets Volatility - Correlation Analysis Methode   \n'+
              '                  *************************************************************************\n')
    selected_assets_volatility_df = get_selected_assets_volatility(asset_volatility_df, selected_assets_list)
    display(selected_assets_volatility_df)
    
    
    print('\n                      *************************************************************************\n'+
             '                            Diversified Portfolio Assets correlation Matrix Cluster Map \n'+
              '                      *************************************************************************\n')
    get_selected_assets_corr_mat_clustermap( 'day', log_returns, correlation_coefficient_treshold)
                                                   
    

    

portfolio_diversification_and_assets_volatility_Corr_Analysis_main_function()
                           ************************************************************
                                            All the Initial Assets Log Returns 
                           ************************************************************

AEM AGI ATS BLX BMO BN BNS BTE BTO BYD ... T TD TFII TPZ TRI TRP WCN WFG WPM X
Date
2019-09-05 -0.044792 -0.045142 0.000000 0.031340 0.016443 0.012316 0.007420 0.030772 0.028079 0.023600 ... 0.004748 0.009185 0.000000 -0.004430 0.013679 -0.011619 -0.000763 -0.006081 -0.026317 0.029748
2019-09-06 -0.030792 -0.041243 0.000000 0.007684 0.008908 0.007131 0.010479 0.029853 -0.002579 -0.004174 ... 0.009981 0.010911 0.041314 -0.001666 0.001697 -0.005469 0.001743 0.000000 -0.040822 -0.018269
2019-09-09 -0.030210 -0.025896 0.009316 0.023772 0.013638 -0.008073 0.008377 0.043172 0.027067 0.042978 ... 0.014787 0.005411 0.000000 0.009406 -0.021709 0.002348 -0.022784 0.030604 -0.024250 0.071156
2019-09-10 -0.017833 -0.001545 0.000000 0.009036 0.018039 -0.011756 0.005426 0.000000 0.018675 0.024145 ... 0.021246 0.011802 0.000000 -0.003310 -0.019096 -0.003327 -0.005918 0.053207 -0.008216 0.015416
2019-09-11 0.009306 0.000000 0.000000 0.039424 0.002630 0.007411 0.007547 -0.007067 0.011954 0.035725 ... 0.030401 0.001599 0.022345 -0.003875 -0.017369 -0.018200 -0.003029 0.005594 0.006792 0.063179
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 -0.005693 -0.002552 0.001841 0.009201 0.005535 0.005483 0.003505 0.019152 0.009959 -0.001834 ... 0.001519 -0.002868 0.009018 0.005571 0.005638 0.008176 0.000161 0.006889 0.003054 0.018144
2024-08-27 -0.000608 -0.004096 -0.004425 0.012677 -0.063600 0.008469 0.027805 -0.019152 -0.011477 0.000000 ... -0.005582 0.008581 -0.015835 0.003881 0.022125 0.005268 -0.002416 -0.015061 0.002085 0.004222
2024-08-28 -0.012968 -0.019690 -0.011896 0.001614 -0.017194 -0.006649 -0.021241 -0.011111 0.009674 -0.008379 ... 0.008614 -0.004365 -0.003695 -0.012250 -0.003517 -0.005048 -0.003177 -0.006654 -0.016799 -0.015656
2024-08-29 0.010778 0.003135 -0.000374 0.007069 0.013104 0.004437 0.006318 0.024829 0.002104 0.008546 ... -0.003032 0.000336 0.001883 0.022162 -0.004413 0.006579 0.001886 0.006317 0.004388 0.030812
2024-08-30 -0.002697 0.005722 0.004479 0.003835 0.007924 0.011804 0.013320 -0.030431 0.004194 0.001501 ... 0.007060 0.007875 -0.004782 0.005465 0.009450 0.012814 0.003922 -0.004621 0.002430 -0.017001

1256 rows × 76 columns

                   ***************************************************************************
                       Diversified Portfolio Assets Log Returns- Correlation Analysis Methode 
                   ***************************************************************************

AGI Log return BLX Log return BTO Log return BYD Log return CIX Log return ELD Log return ERO Log return FNV Log return GSY Log return H Log return K Log return KEY Log return LAAC Log return SHOP Log return TPZ Log return
quarter
2019Q3 -0.012309 0.006967 0.004450 0.001329 -0.003165 -0.000018 -0.000722 -0.005514 0.000208 0.000759 0.000879 0.004676 -0.002319 -0.011955 0.000107
2019Q4 0.000609 0.001378 0.002133 0.003524 0.000353 0.000880 0.003530 0.001993 0.000097 0.003116 0.001263 0.002122 0.000603 0.003804 -0.000589
2020Q1 -0.002932 -0.011437 -0.010657 -0.011784 0.000771 -0.002962 -0.014102 -0.000564 -0.000254 -0.010083 -0.002142 -0.010603 -0.002829 0.000766 -0.013729
2020Q2 0.010017 0.002108 0.003883 0.005891 -0.001401 0.001453 0.010335 0.005408 0.000431 0.000773 0.001669 0.002803 0.010238 0.013059 0.004576
2020Q3 -0.000956 0.001191 -0.000692 0.006003 0.001339 0.000101 0.000006 0.000021 0.000064 0.000929 -0.000225 -0.000096 0.012647 0.001169 -0.000649
2020Q4 -0.000071 0.004429 0.005725 0.005241 -0.000659 0.001615 0.001666 -0.001652 0.000051 0.005159 -0.000441 0.005160 0.001515 0.001582 0.004346
2021Q1 -0.001812 -0.000481 0.004112 0.005205 0.004082 -0.001170 0.000851 0.000033 -0.000002 0.001767 0.000442 0.003378 0.004053 -0.000373 0.001637
2021Q2 -0.000283 0.000512 -0.000175 0.000667 0.002409 0.000476 0.003291 0.002358 0.000024 -0.001002 0.000397 0.000650 -0.001264 0.004411 0.002179
2021Q3 -0.000897 0.002296 0.002156 0.000443 0.000144 -0.000482 -0.002642 -0.001691 0.000010 -0.000109 0.000044 0.000856 0.006384 -0.001168 -0.000340
2021Q4 0.001082 -0.000641 0.001621 0.000560 0.001362 -0.000474 -0.002331 0.001012 -0.000033 0.003409 0.000264 0.001188 0.004148 0.000247 0.000651
2022Q1 0.001512 -0.000763 -0.002177 0.000089 0.000921 -0.000921 -0.000701 0.002337 -0.000131 -0.000076 0.000161 -0.000407 0.004500 -0.011481 0.001025
2022Q2 -0.002877 -0.002303 -0.001007 -0.004458 -0.000048 -0.001185 -0.008903 -0.003068 -0.000053 -0.004125 0.001763 -0.004059 -0.010455 -0.012449 -0.001722
2022Q3 0.000897 0.000017 -0.002257 -0.000625 -0.004366 -0.000655 0.004158 -0.001466 0.000021 0.001424 -0.000246 -0.000972 0.004136 -0.002314 0.000346
2022Q4 0.004971 0.003669 0.001159 0.002182 0.002257 0.001198 0.003542 0.002147 0.000162 0.001759 0.000486 0.001502 -0.005160 0.004022 0.000717
2023Q1 0.003108 0.001354 -0.001074 0.002657 -0.000145 0.000956 0.004018 0.001106 0.000196 0.003417 -0.000858 -0.005146 0.002230 0.005208 0.000497
2023Q2 -0.000382 0.004074 -0.001725 0.001307 0.003240 0.000503 0.002210 -0.000320 0.000192 0.000420 0.000250 -0.004569 -0.001192 0.004811 0.000840
2023Q3 -0.000829 -0.000459 0.000063 -0.002045 -0.002345 -0.000659 -0.002539 -0.001010 0.000218 -0.001202 -0.001821 0.002719 -0.002736 -0.002678 0.000545
2023Q4 0.002830 0.002637 0.002506 0.000499 0.005045 0.001270 -0.001395 -0.002905 0.000322 0.003298 0.000182 0.004898 -0.001273 0.005650 0.001124
2024Q1 0.001518 0.003247 0.000090 0.001233 0.005166 -0.000193 0.003274 0.001243 0.000224 0.003330 0.000564 0.001769 -0.002609 -0.000154 0.002107
2024Q2 0.000995 0.000298 -0.000492 -0.003130 -0.005030 -0.000390 0.001641 -0.000038 0.000218 -0.000769 0.000256 -0.001467 -0.008276 -0.002470 0.000527
2024Q3 0.004697 0.001639 0.004034 0.001944 0.005704 0.000898 -0.000900 0.000686 0.000304 0.000023 0.007607 0.004431 -0.004372 0.002605 0.003693
                  *************************************************************************
                  Diversified Portfolio Assets Volatility - Correlation Analysis Methode   
                  *************************************************************************

AGI BLX BTO BYD CIX ELD ERO FNV GSY H K KEY LAAC SHOP TPZ
Quater
2020Q3 0.7 0.7 0.8 0.9 0.6 0.2 0.7 0.4 0.0 0.6 0.3 0.7 0.8 0.7 0.8
2020Q4 0.7 0.7 0.8 0.9 0.7 0.2 0.7 0.4 0.0 0.6 0.3 0.7 1.0 0.7 0.8
2021Q1 0.7 0.6 0.8 0.9 0.6 0.2 0.7 0.4 0.0 0.6 0.3 0.7 1.1 0.7 0.7
2021Q2 0.5 0.4 0.4 0.5 0.6 0.1 0.5 0.3 0.0 0.5 0.2 0.5 1.0 0.6 0.3
2021Q3 0.5 0.3 0.3 0.4 0.6 0.1 0.5 0.3 0.0 0.4 0.2 0.4 1.0 0.5 0.2
2021Q4 0.4 0.2 0.3 0.4 0.6 0.1 0.5 0.3 0.0 0.4 0.2 0.3 0.9 0.5 0.2
2022Q1 0.4 0.2 0.3 0.4 0.5 0.1 0.5 0.3 0.0 0.3 0.2 0.3 0.8 0.6 0.2
2022Q2 0.4 0.2 0.4 0.4 0.4 0.1 0.5 0.3 0.0 0.4 0.2 0.3 0.8 0.7 0.2
2022Q3 0.4 0.2 0.4 0.4 0.4 0.1 0.6 0.3 0.0 0.4 0.2 0.3 0.8 0.9 0.2
2022Q4 0.5 0.3 0.4 0.4 0.4 0.1 0.6 0.3 0.0 0.4 0.2 0.3 0.8 0.9 0.2
2023Q1 0.4 0.3 0.4 0.4 0.4 0.1 0.7 0.3 0.0 0.4 0.2 0.4 0.7 0.9 0.2
2023Q2 0.4 0.3 0.4 0.3 0.5 0.1 0.7 0.3 0.0 0.4 0.2 0.5 0.6 0.8 0.2
2023Q3 0.4 0.3 0.4 0.3 0.5 0.1 0.6 0.3 0.0 0.3 0.2 0.6 0.5 0.7 0.2
2023Q4 0.3 0.3 0.4 0.3 0.4 0.1 0.5 0.3 0.0 0.3 0.2 0.6 0.6 0.6 0.1
2024Q1 0.3 0.3 0.4 0.3 0.6 0.1 0.5 0.3 0.0 0.3 0.2 0.6 0.6 0.6 0.1
2024Q2 0.3 0.3 0.3 0.3 0.7 0.1 0.5 0.3 0.0 0.3 0.2 0.4 0.6 0.5 0.1
2024Q3 0.3 0.3 0.3 0.3 0.8 0.1 0.5 0.3 0.0 0.3 0.2 0.4 0.6 0.5 0.1
                      *************************************************************************
                            Diversified Portfolio Assets correlation Matrix Cluster Map 
                      *************************************************************************

Principal Components Analysis(PCA)¶

In [10]:
#Selecting most important economic factors
#-------------------------------------------------------------------------------
 #Principal Components Analysis(PCA) to select most importance assets   
#-------------------------------------------------------------------------------



def selecting_important_item_PCA_treshold_method(matrix,threshold):
    
    return matrix[(matrix.abs() > threshold).any(axis=1)].index.to_list()

def selecting_important_item_corr_treshold_method(matrix,threshold):
    
    return matrix[(matrix < threshold).any(axis=1)].index.to_list()

def setting_PCA_for_assets_selection(log_returns_df):
    # economic indicators dataset
   
    # Standardizing the data
    scaler = StandardScaler()
    scaled_data_df = scaler.fit_transform(log_returns_df)

    # Applying PCA
    all_pca = PCA(n_components=None)  # Use all components to find the best number of important indicators
    all_principal_components = all_pca.fit_transform(scaled_data_df)

    # Explained variance
    explained_variance = all_pca.explained_variance_ratio_

    # Principal Component Loadings(coefficients)
    loadings_matrix = all_pca.components_

    # Create a DataFrame for loadings 
    loadings_matrix_df = pd.DataFrame(loadings_matrix.T, columns=[f'PC{i+1}' for i in range(loadings_matrix.shape[0])], 
                                      index=log_returns.columns)

   
    return loadings_matrix_df, explained_variance

#----------------------

def get_num_components(explained_variance,cumulative_variance_treshold = 0.9):    
    # Determine the number of components explaining the cumulative varience treshold of the variance
    cumulative_variance = explained_variance.cumsum()
    return  (cumulative_variance <= cumulative_variance_treshold).sum() + 1

def select_top_components_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    return loadings_matrix_df.iloc[:, :num_components]
     
def select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    selected_components_df = loadings_matrix_df.iloc[:, :num_components]
    # Find indicators with high loadings
    return selected_components_df[(selected_components_df.abs() > threshold_for_high_loadings).any(axis=1)]

    
def plot_explained_variance_for_assets_selection(loadings_matrix_df, explained_variance):

    # Print explained variance
    
    explained_variance_df = pd.DataFrame(explained_variance).T
    explained_variance_df.columns = loadings_matrix_df.columns
    print('\nexplained_variance_df\n')
    display(explained_variance_df)
    
    # Plotting the explained variance
    plt.figure(figsize=(10, 6))
    plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.5, align='center', label='individual explained variance')
    plt.step(range(1, len(explained_variance) + 1), np.cumsum(explained_variance), where='mid', label='cumulative explained variance')
    plt.xlabel('Principal Components')
    plt.ylabel('Explained Variance Ratio')
    plt.title('Explained Variance by Principal Components')
    plt.legend(loc='best')
    plt.show()
    

#----------------------
def print_explained_variance(loadings_matrix_df, explained_variance,cumulative_variance_treshold, num_components, threshold_for_highest_loadings):
     # Print explained variance
    print('\nloadings_matrix_df\n')
    display(loadings_matrix_df)
    num_components = get_num_components(explained_variance,cumulative_variance_treshold)
    top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    print('\ntop_components_df\n')
    display(top_components_df)
    print('\nMost important assets with top components\n')
    top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    display(top_indicators_df)
    
    
    
def get_all_assets_corr_matrix(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5 ):
            
    all_assets_matrix =  generate_correlation_matrix(log_returns_df)
    return all_assets_matrix

    
def get_most_important_assets_list_PCA(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5 ):
    
    loadings_matrix_df, explained_variance = setting_PCA_for_assets_selection(log_returns_df)
    #print('\nloadings_matrix_df\n')
    #display(loadings_matrix_df)
    num_components = get_num_components(explained_variance,cumulative_variance_treshold)
    top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    #print('\ntop_components_df\n')
    #display(top_components_df)
    #print('\ntop_indicators_df\n')
    top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    #display(top_indicators_df)
    most_important_assets_list = selecting_important_item_PCA_treshold_method(top_indicators_df, threshold_for_highest_loadings)
        
    return most_important_assets_list   
  
    
def get_most_important_assets_log_returns_df_PCA(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5 ):
    
    most_important_assets_log_returns_list = get_most_important_assets_list_PCA(log_returns_df, 
                                                                    cumulative_variance_treshold, threshold_for_highest_loadings)
                
    most_important_assets_log_returns_df = log_returns_df[most_important_assets_log_returns_list]
    #print('\n most_important_assets_df\n')    
    #display(most_important_assets_log_returns_df)
    
    return most_important_assets_log_returns_df
        
        
def get_most_important_assets_corr_matrix_PCA(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5 ):
       
    most_important_assets_log_returns_df = get_most_important_assets_log_returns_df_PCA(log_returns_df, 
                                                                    cumulative_variance_treshold , threshold_for_highest_loadings)
        
    #PCA couple with covarience matrice to select most important portfolio assets
    most_important_assets_matrix =  generate_correlation_matrix(most_important_assets_log_returns_df)
    
    return most_important_assets_matrix

#----------------------------------------------------------------------------------------------------------
 #Stack Correlation Analysis and Principal Components Analysis(PCA) to select most divesified assets   
#-------------------------------------------------------------------------------------------------------------

def get_most_diversify_portfolio_asset_log_return_df_stack_corr_PCA(log_returns_df,most_diversify_portfolio_assets_list):
    return log_returns_df[most_diversify_portfolio_assets_list]

        
def using_PCA__and_corr_matrix_to_diversify_portfolio(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5, 
                                              correlation_coefficient_treshold= 0.40):
    
    most_important_assets_corr_matrix = get_most_important_assets_corr_matrix_PCA(log_returns_df, cumulative_variance_treshold,
                                                                                          threshold_for_highest_loadings)
              
    most_diversify_portfolio_assets_list = selecting_important_item_corr_treshold_method(most_important_assets_corr_matrix,
                                                                                              correlation_coefficient_treshold)
    
    most_diversify_portfolio_assets_df = log_returns_df[most_diversify_portfolio_assets_list]
    most_diversify_portfolio_assets_corr_matrix = generate_correlation_matrix(most_diversify_portfolio_assets_df)
    
    return most_diversify_portfolio_assets_corr_matrix
    
def plotting_selected_assets_corr_mat_clustermap(assets_matrix, title, dendrogram = True):
             
    g = sns.clustermap(assets_matrix,  method = 'ward', metric='euclidean', cmap   = 'RdBu',  annot  = True,  annot_kws = {'size': 8},
                      row_cluster=dendrogram, col_cluster=dendrogram)
    g.fig.suptitle('Diversified Portfolio Assets Log Returns Correlation Matrix Cluster Map using PCA', y=0.9, fontsize=12)
    plt.subplots_adjust(top=0.85)
    plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90)
    plt.setp(g.ax_heatmap.get_yticklabels(), rotation=360)
    g.cax.set_position([1.02, 0.2, 0.03, 0.4])  # [left, bottom, width, height]
    g.cax.set_ylabel('Correlation Coefficient', rotation=270, labelpad=15)  # Rotate label
    g.fig.suptitle(title, y=0.9, fontsize=12)


 #----------------------------------------------------------------------------------------------------------
 #                              Most divesified Assets Daily Volatility   
#-------------------------------------------------------------------------------------------------------------
#selected assets daily volatility
def get_selected_assets_volatility_df_from_Stack_Corr_PCA_method(selected_assets_adj_close_price_log_return_df, frequency_date_column = 'day'):
                          
    frequency = frequency_date_column[0].upper()
    
    #selected_assets_adj_close_price_log_return_df = get_most_important_assets_log_returns_df_PCA(log_returns) 
    
    #Market volatility
        
    selected_assets_volatility_df = selected_assets_adj_close_price_log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(selected_assets_volatility_df.columns):
        selected_assets_volatility_df = selected_assets_volatility_df.rename(columns={col: col+' Volatility'})
    
    selected_assets_volatility_df = selected_assets_volatility_df.dropna(axis=0)
    
    if frequency == 'D':
        selected_assets_volatilities = selected_assets_volatility_df
    else:
        selected_assets_volatility_df[frequency_date_column] = pd.to_datetime(selected_assets_volatility_df.index, format = '%m/%Y')
        selected_assets_volatility_df[frequency_date_column] = selected_assets_volatility_df[frequency_date_column].dt.to_period(frequency)
        
        #market_adj_close_price_log_return_frequency_df = market_volatility_df
        selected_assets_volatility_df.set_index(frequency_date_column, inplace=True)
        selected_assets_volatilities = selected_assets_volatility_df.groupby(frequency_date_column).mean()
        selected_assets_volatilities = round(selected_assets_volatilities,1)
        selected_assets_volatilities = selected_assets_volatilities.dropna(axis=0)
        
    return selected_assets_volatilities

#--------------------------------------------------MAIN FUNCION --------------------------------------------------------------
def select_most_important_Portfolio_assets_and_diversification_stack_corr_PCA_mathod_main_function():
    cumulative_variance_treshold = 1.0
    threshold_for_highest_loadings = 0.5
    correlation_coefficient_treshold = 0.3                                                               
    #-------------------------------------
    loadings_matrix_df, explained_variance  = setting_PCA_for_assets_selection(log_returns)
    num_components = get_num_components(explained_variance,cumulative_variance_treshold)
    top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    most_important_assets_list = selecting_important_item_PCA_treshold_method(top_indicators_df, threshold_for_highest_loadings)
    #------------------------------------
    plot_explained_variance_for_assets_selection(loadings_matrix_df, explained_variance)
    print_explained_variance(loadings_matrix_df, explained_variance,cumulative_variance_treshold, num_components, threshold_for_highest_loadings)
    most_important_assets_log_returns_df_PCA = get_most_important_assets_log_returns_df_PCA(log_returns, cumulative_variance_treshold , 
                                                                                        threshold_for_highest_loadings)
    
    most_important_assets_corr_matrix_PCA = get_most_important_assets_corr_matrix_PCA(log_returns, cumulative_variance_treshold,
                                                                                  threshold_for_highest_loadings )
    
    most_diversify_portfolio_assets_list = selecting_important_item_corr_treshold_method(most_important_assets_corr_matrix_PCA,
                                                                                              correlation_coefficient_treshold)
    
    most_diversify_portfolio_assets_log_returns_df = get_most_diversify_portfolio_asset_log_return_df_stack_corr_PCA(log_returns,
                                                                                                         most_diversify_portfolio_assets_list)

    most_diversify_portfolio_assets_corr_matrix = using_PCA__and_corr_matrix_to_diversify_portfolio(log_returns, cumulative_variance_treshold ,
                                                  threshold_for_highest_loadings, correlation_coefficient_treshold )
    
    selected_assets_volatility_df_stack_corr_PCA_method = \
                    get_selected_assets_volatility_df_from_Stack_Corr_PCA_method(most_diversify_portfolio_assets_log_returns_df, 
                                                                                            frequency_date_column = 'day')
    
#-------graphs and tables--------------------------------------------------
    print('\nMost Important Assets Log returns  using PCA\n')
    display(most_important_assets_log_returns_df_PCA)
    plotting_selected_assets_corr_mat_clustermap(most_important_assets_corr_matrix_PCA, 'Most Important Assets Correlation Matrix PCA Method')
    print('\nMost Diversified Assets Log returns  using Stack Correlation Matrix/PCA Method\n')
    display(most_diversify_portfolio_assets_log_returns_df)
    plotting_selected_assets_corr_mat_clustermap(most_diversify_portfolio_assets_corr_matrix, 
                                                 'Most Diversified Assets Correlation Matrix - stacking Correlation Analysis/PCA Method')
    print('\nDiversified Portfolio Assets Volatility \n')
    display(selected_assets_volatility_df_stack_corr_PCA_method)

select_most_important_Portfolio_assets_and_diversification_stack_corr_PCA_mathod_main_function()   
explained_variance_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76
0 0.413571 0.092048 0.042029 0.033425 0.028264 0.022387 0.016672 0.016169 0.012889 0.012505 ... 0.001654 0.001622 0.001451 0.0014 0.001346 0.001299 0.000955 0.000908 0.000633 0.000318

1 rows × 76 columns

loadings_matrix_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76
AEM -0.061928 -0.300673 -0.018603 -0.033758 -0.043640 0.041924 0.078728 0.048601 -0.074202 -0.031446 ... -0.150138 0.010027 -0.125825 -0.016254 0.082313 0.017581 -0.009547 -0.003375 -0.011353 0.018102
AGI -0.054214 -0.317213 -0.019314 -0.045093 -0.045763 0.090384 -0.002040 -0.005456 -0.035734 0.009407 ... 0.181376 -0.121596 0.062707 -0.123947 -0.047106 0.035094 -0.061341 -0.021900 -0.012963 -0.019846
ATS -0.074365 0.000179 0.016299 0.124022 0.071638 -0.140825 -0.232579 0.015172 -0.192035 -0.326613 ... -0.005813 0.006769 0.005296 0.004956 -0.003904 -0.010756 0.026051 -0.019479 -0.010712 0.000526
BLX -0.107064 0.056913 -0.040209 0.044130 -0.129863 0.300048 0.074065 -0.026251 -0.141236 0.094920 ... 0.030163 -0.038492 0.002709 -0.034415 0.018790 0.019224 -0.004271 -0.044736 0.023852 -0.015881
BMO -0.154090 0.072810 -0.055718 -0.052644 0.043137 -0.065054 -0.137062 0.124146 -0.044911 0.083070 ... -0.036197 0.120071 0.275386 -0.716781 0.185395 -0.152984 -0.059495 -0.056711 0.032720 -0.002581
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TRP -0.135361 0.025675 -0.085641 -0.110914 -0.078325 -0.136525 0.097596 0.112444 0.060010 0.160832 ... -0.105272 0.342310 0.055739 0.120079 0.022928 0.040598 0.035714 0.035335 -0.039161 -0.005182
WCN -0.101385 0.003086 0.169273 -0.183968 0.079093 -0.059743 0.243405 -0.027527 0.027130 0.038142 ... -0.057424 -0.042657 0.041802 0.003885 -0.026465 0.054365 -0.008726 0.002437 0.000136 -0.012959
WFG -0.111772 -0.004493 -0.017282 0.011134 0.003261 -0.047796 -0.191205 0.068690 0.132980 -0.215008 ... 0.005610 0.005589 0.007621 -0.025308 0.021077 0.000838 -0.010475 -0.021437 0.006819 0.010910
WPM -0.066090 -0.309043 0.011015 -0.047076 0.025778 -0.029112 -0.026209 0.010502 -0.065610 -0.020476 ... 0.194700 0.047612 0.001477 -0.209939 0.020964 0.072836 0.010258 -0.005362 -0.329228 -0.018788
X -0.092889 -0.002042 -0.105657 0.064158 0.117769 0.187691 -0.118618 -0.192140 0.180294 -0.017633 ... -0.004554 -0.021461 0.017062 0.020064 -0.034882 -0.005765 0.006582 -0.034042 -0.000958 0.008777

76 rows × 76 columns

top_components_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76
AEM -0.061928 -0.300673 -0.018603 -0.033758 -0.043640 0.041924 0.078728 0.048601 -0.074202 -0.031446 ... -0.150138 0.010027 -0.125825 -0.016254 0.082313 0.017581 -0.009547 -0.003375 -0.011353 0.018102
AGI -0.054214 -0.317213 -0.019314 -0.045093 -0.045763 0.090384 -0.002040 -0.005456 -0.035734 0.009407 ... 0.181376 -0.121596 0.062707 -0.123947 -0.047106 0.035094 -0.061341 -0.021900 -0.012963 -0.019846
ATS -0.074365 0.000179 0.016299 0.124022 0.071638 -0.140825 -0.232579 0.015172 -0.192035 -0.326613 ... -0.005813 0.006769 0.005296 0.004956 -0.003904 -0.010756 0.026051 -0.019479 -0.010712 0.000526
BLX -0.107064 0.056913 -0.040209 0.044130 -0.129863 0.300048 0.074065 -0.026251 -0.141236 0.094920 ... 0.030163 -0.038492 0.002709 -0.034415 0.018790 0.019224 -0.004271 -0.044736 0.023852 -0.015881
BMO -0.154090 0.072810 -0.055718 -0.052644 0.043137 -0.065054 -0.137062 0.124146 -0.044911 0.083070 ... -0.036197 0.120071 0.275386 -0.716781 0.185395 -0.152984 -0.059495 -0.056711 0.032720 -0.002581
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TRP -0.135361 0.025675 -0.085641 -0.110914 -0.078325 -0.136525 0.097596 0.112444 0.060010 0.160832 ... -0.105272 0.342310 0.055739 0.120079 0.022928 0.040598 0.035714 0.035335 -0.039161 -0.005182
WCN -0.101385 0.003086 0.169273 -0.183968 0.079093 -0.059743 0.243405 -0.027527 0.027130 0.038142 ... -0.057424 -0.042657 0.041802 0.003885 -0.026465 0.054365 -0.008726 0.002437 0.000136 -0.012959
WFG -0.111772 -0.004493 -0.017282 0.011134 0.003261 -0.047796 -0.191205 0.068690 0.132980 -0.215008 ... 0.005610 0.005589 0.007621 -0.025308 0.021077 0.000838 -0.010475 -0.021437 0.006819 0.010910
WPM -0.066090 -0.309043 0.011015 -0.047076 0.025778 -0.029112 -0.026209 0.010502 -0.065610 -0.020476 ... 0.194700 0.047612 0.001477 -0.209939 0.020964 0.072836 0.010258 -0.005362 -0.329228 -0.018788
X -0.092889 -0.002042 -0.105657 0.064158 0.117769 0.187691 -0.118618 -0.192140 0.180294 -0.017633 ... -0.004554 -0.021461 0.017062 0.020064 -0.034882 -0.005765 0.006582 -0.034042 -0.000958 0.008777

76 rows × 76 columns

Most important assets with top components

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76
AGI -0.054214 -0.317213 -0.019314 -0.045093 -0.045763 0.090384 -0.002040 -0.005456 -0.035734 0.009407 ... 0.181376 -0.121596 0.062707 -0.123947 -0.047106 0.035094 -0.061341 -0.021900 -0.012963 -0.019846
ATS -0.074365 0.000179 0.016299 0.124022 0.071638 -0.140825 -0.232579 0.015172 -0.192035 -0.326613 ... -0.005813 0.006769 0.005296 0.004956 -0.003904 -0.010756 0.026051 -0.019479 -0.010712 0.000526
BMO -0.154090 0.072810 -0.055718 -0.052644 0.043137 -0.065054 -0.137062 0.124146 -0.044911 0.083070 ... -0.036197 0.120071 0.275386 -0.716781 0.185395 -0.152984 -0.059495 -0.056711 0.032720 -0.002581
BN -0.148970 0.045362 0.071590 -0.013753 0.065388 -0.014771 -0.054065 0.124413 0.021003 0.043324 ... -0.031771 0.060318 -0.138004 0.000106 0.006984 -0.144015 -0.124427 -0.053054 0.019647 0.005267
BTO -0.129136 0.092535 -0.022307 -0.017024 -0.121852 0.114454 -0.197616 0.060695 0.035904 0.007855 ... -0.080556 0.037866 -0.024051 0.022130 -0.059100 -0.022345 0.011791 -0.055302 -0.027939 -0.018902
CIX -0.055212 -0.016288 0.002035 0.029668 -0.086048 0.204812 -0.014785 -0.145424 0.258110 0.389522 ... 0.001290 -0.008863 0.013643 -0.003692 -0.022874 -0.021235 0.007475 -0.010454 -0.023243 -0.000405
CNQ -0.129724 0.049238 -0.268474 0.045882 0.044639 -0.134181 0.188874 -0.090676 -0.033049 -0.005931 ... 0.162870 0.072036 0.008063 0.088833 -0.010372 -0.659676 -0.072624 0.160442 0.091158 -0.003918
CWB -0.143854 0.000884 0.170220 0.152204 0.038359 -0.093760 0.020776 -0.135562 0.059152 0.034551 ... 0.292793 -0.227813 -0.215333 0.106362 0.269248 0.111149 -0.184335 0.084492 -0.041895 -0.019573
DOL -0.161772 0.002449 0.044724 -0.026729 0.082273 -0.017104 -0.027028 -0.073629 -0.150492 0.079942 ... -0.021253 0.046153 -0.002184 0.018226 -0.002938 -0.015618 -0.053428 -0.044170 0.003260 0.749315
DOO -0.158596 0.001864 0.037617 -0.037232 0.076613 0.000041 -0.053165 -0.080575 -0.155516 0.069618 ... -0.059032 0.056226 -0.015360 0.031245 0.031729 -0.045749 0.029585 0.007027 -0.057994 -0.649662
ENB -0.146013 0.035765 -0.100768 -0.085941 -0.050137 -0.157815 0.101209 0.051884 -0.003982 0.117162 ... 0.146058 -0.584737 0.018011 -0.156185 0.016373 -0.028183 -0.084224 -0.046819 0.035148 -0.015616
IGM -0.128902 -0.003623 0.249091 0.142736 0.183396 -0.038799 0.104929 -0.125032 -0.036014 0.091382 ... 0.080119 0.009791 -0.067273 -0.052730 0.005293 -0.142015 0.713820 -0.330229 0.085619 0.009130
PEY -0.151227 0.051928 -0.020777 -0.197706 0.037814 0.166523 -0.050712 -0.111970 0.039265 -0.035241 ... -0.034462 -0.054046 -0.090142 -0.114874 0.082819 0.068073 0.378847 0.774294 -0.107697 0.067495
SIL -0.081061 -0.311668 -0.026162 0.018422 -0.015973 0.034826 -0.056791 -0.013075 -0.063445 -0.010083 ... 0.017741 0.066615 -0.052449 0.025368 -0.006615 0.115071 -0.014371 0.117720 0.807479 -0.038236
SLF -0.153174 0.048941 -0.005763 -0.088314 0.053784 -0.044194 -0.094464 0.034118 -0.024130 0.021328 ... -0.232536 0.048768 -0.353892 -0.070497 0.116209 -0.161623 -0.026336 0.033871 0.026828 0.012075
TD -0.150345 0.067267 -0.058720 -0.102146 0.065792 -0.045129 -0.167000 0.058571 -0.045571 0.099347 ... -0.009191 0.039243 -0.091276 0.040564 -0.494215 -0.009929 -0.007028 0.038986 -0.001102 -0.027592
WFG -0.111772 -0.004493 -0.017282 0.011134 0.003261 -0.047796 -0.191205 0.068690 0.132980 -0.215008 ... 0.005610 0.005589 0.007621 -0.025308 0.021077 0.000838 -0.010475 -0.021437 0.006819 0.010910

17 rows × 76 columns

Most Important Assets Log returns  using PCA

AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
Date
2019-09-05 -0.045142 0.000000 0.016443 0.012316 0.028079 -0.053481 0.026257 0.003223 0.005728 0.006324 0.012107 0.020621 0.003369 -0.046697 0.018389 0.009185 -0.006081
2019-09-06 -0.041243 0.000000 0.008908 0.007131 -0.002579 0.006930 -0.016591 0.000946 0.001317 0.001078 0.007894 -0.003107 0.005032 -0.030040 0.008717 0.010911 0.000000
2019-09-09 -0.025896 0.009316 0.013638 -0.008073 0.027067 -0.002766 0.033720 -0.002840 0.004378 0.000667 0.001746 -0.005562 0.011093 -0.019287 0.010036 0.005411 0.030604
2019-09-10 -0.001545 0.000000 0.018039 -0.011756 0.018675 0.020563 0.037696 -0.001139 0.005664 0.006024 0.009259 -0.007008 0.010971 0.006131 0.004865 0.011802 0.053207
2019-09-11 0.000000 0.000000 0.002630 0.007411 0.011954 0.066275 -0.010176 0.006622 0.005199 0.006077 0.001439 0.010132 0.013547 0.013826 0.003461 0.001599 0.005594
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 -0.002552 0.001841 0.005535 0.005483 0.009959 -0.012112 0.026545 -0.000945 -0.003348 -0.002865 0.005783 -0.009842 0.003697 0.000887 0.001284 -0.002868 0.006889
2024-08-27 -0.004096 -0.004425 -0.063600 0.008469 -0.011477 0.042652 -0.012643 0.001214 0.005204 0.003964 -0.005530 0.003863 -0.006943 -0.001183 0.002563 0.008581 -0.015061
2024-08-28 -0.019690 -0.011896 -0.017194 -0.006649 0.009674 0.003180 -0.011435 -0.004189 -0.003714 -0.003118 -0.007591 -0.013042 0.002783 -0.032780 -0.002906 -0.004365 -0.006654
2024-08-29 0.003135 -0.000374 0.013104 0.004437 0.002104 0.037727 0.010893 0.002570 0.003343 0.001844 0.004814 -0.000217 0.002775 0.005792 0.004255 0.000336 0.006317
2024-08-30 0.005722 0.004479 0.007924 0.011804 0.004194 0.014500 -0.019418 0.003371 0.001667 0.000102 0.015551 0.010579 0.006904 -0.005181 0.006440 0.007875 -0.004621

1256 rows × 17 columns

Most Diversified Assets Log returns  using Stack Correlation Matrix/PCA Method

AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
Date
2019-09-05 -0.045142 0.000000 0.016443 0.012316 0.028079 -0.053481 0.026257 0.003223 0.005728 0.006324 0.012107 0.020621 0.003369 -0.046697 0.018389 0.009185 -0.006081
2019-09-06 -0.041243 0.000000 0.008908 0.007131 -0.002579 0.006930 -0.016591 0.000946 0.001317 0.001078 0.007894 -0.003107 0.005032 -0.030040 0.008717 0.010911 0.000000
2019-09-09 -0.025896 0.009316 0.013638 -0.008073 0.027067 -0.002766 0.033720 -0.002840 0.004378 0.000667 0.001746 -0.005562 0.011093 -0.019287 0.010036 0.005411 0.030604
2019-09-10 -0.001545 0.000000 0.018039 -0.011756 0.018675 0.020563 0.037696 -0.001139 0.005664 0.006024 0.009259 -0.007008 0.010971 0.006131 0.004865 0.011802 0.053207
2019-09-11 0.000000 0.000000 0.002630 0.007411 0.011954 0.066275 -0.010176 0.006622 0.005199 0.006077 0.001439 0.010132 0.013547 0.013826 0.003461 0.001599 0.005594
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 -0.002552 0.001841 0.005535 0.005483 0.009959 -0.012112 0.026545 -0.000945 -0.003348 -0.002865 0.005783 -0.009842 0.003697 0.000887 0.001284 -0.002868 0.006889
2024-08-27 -0.004096 -0.004425 -0.063600 0.008469 -0.011477 0.042652 -0.012643 0.001214 0.005204 0.003964 -0.005530 0.003863 -0.006943 -0.001183 0.002563 0.008581 -0.015061
2024-08-28 -0.019690 -0.011896 -0.017194 -0.006649 0.009674 0.003180 -0.011435 -0.004189 -0.003714 -0.003118 -0.007591 -0.013042 0.002783 -0.032780 -0.002906 -0.004365 -0.006654
2024-08-29 0.003135 -0.000374 0.013104 0.004437 0.002104 0.037727 0.010893 0.002570 0.003343 0.001844 0.004814 -0.000217 0.002775 0.005792 0.004255 0.000336 0.006317
2024-08-30 0.005722 0.004479 0.007924 0.011804 0.004194 0.014500 -0.019418 0.003371 0.001667 0.000102 0.015551 0.010579 0.006904 -0.005181 0.006440 0.007875 -0.004621

1256 rows × 17 columns

Diversified Portfolio Assets Volatility 

AGI Volatility ATS Volatility BMO Volatility BN Volatility BTO Volatility CIX Volatility CNQ Volatility CWB Volatility DOL Volatility DOO Volatility ENB Volatility IGM Volatility PEY Volatility SIL Volatility SLF Volatility TD Volatility WFG Volatility
Date
2020-09-02 0.722088 0.380698 0.494260 0.510384 0.780097 0.622903 0.815006 0.255488 0.307450 0.295969 0.471170 0.351598 0.387438 0.545365 0.445792 0.435493 0.686560
2020-09-03 0.720578 0.383126 0.494137 0.510851 0.779591 0.620621 0.814673 0.258030 0.307956 0.296476 0.471263 0.355219 0.387536 0.543424 0.446122 0.435766 0.689481
2020-09-04 0.719338 0.383447 0.494077 0.511048 0.779606 0.620899 0.814549 0.258336 0.307954 0.296514 0.471448 0.355847 0.387522 0.542502 0.446038 0.435661 0.689529
2020-09-08 0.719311 0.383335 0.494701 0.511587 0.779410 0.621174 0.819958 0.260213 0.308082 0.296812 0.471670 0.358251 0.387716 0.542682 0.446219 0.436037 0.690172
2020-09-09 0.719990 0.389331 0.494570 0.511666 0.779173 0.620827 0.819753 0.260624 0.308533 0.297460 0.471982 0.358998 0.387579 0.543947 0.446900 0.436161 0.688212
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-26 0.336985 0.328255 0.216709 0.289643 0.260475 0.772576 0.276750 0.084813 0.121523 0.121095 0.172435 0.208262 0.165861 0.333416 0.180709 0.187160 0.302854
2024-08-27 0.336736 0.327967 0.225907 0.289722 0.260503 0.773628 0.276778 0.084779 0.121467 0.121064 0.172460 0.208210 0.166029 0.333366 0.180717 0.187258 0.303123
2024-08-28 0.334901 0.328120 0.226021 0.289590 0.260545 0.773491 0.276080 0.084880 0.121071 0.120928 0.172492 0.208574 0.165680 0.334371 0.180722 0.186212 0.303210
2024-08-29 0.334237 0.327515 0.226337 0.288030 0.260516 0.774304 0.275127 0.084525 0.120509 0.120158 0.172256 0.207535 0.164891 0.333743 0.179847 0.185355 0.303163
2024-08-30 0.334262 0.327472 0.226044 0.288189 0.260478 0.774409 0.275850 0.084472 0.120504 0.120157 0.172824 0.207681 0.164875 0.333793 0.179833 0.185476 0.303192

1005 rows × 17 columns

4. Asset Pricing, Profit & Lost simation and Risk calculation¶

In this section, we will focus on :¶
  • ##### Monte Carlo simulation of stock price using cov matrix and cholesky decomposition
  • ##### Profit & Lost simulation
  • ##### VaR/CVaR calculation under current macroeconomic factors

Before going forward with our analysis, it is crucial to understand the form of the data distribution, here stock price and asset returns distribution over time in other to choose the appropriate model. The daily adjust closed prices charts above show that the stock prices movement and their returns over time folow an independent random process, with stock prices always positive and ncrease indefinitely. The distribution exhibits positive skewnes.The future price movement doesn't depend on its history, but it is determined by both its current state and some inherent randomness such as economic indicators, company performance, investor sentiment, geopolitical events, and unforeseen news. Therefore, the stock prices are considered stochastic. In general we don’t know the distribution of the stock prices, we only know that it is closed to the brownan motion stockastic process. Typically, the logarithm of the stock price follows a Brownian motion with drift.

Stochastic differential equation (SDE) of the stock price 𝑆(𝑡): $$ dS(t) = \mu S(t) \, d(t) + \sigma S(t) \, dW(t) $$ where:

  • $dS(t)$ is the variation (absolute change) of the stock price over the time interval d(t)
  • $\mu$ is the drift term (expected return),
  • $\sigma$ is the volatility of the stock,
  • $W(t)$ is a Wiener process (Geometric Brownian motion).

In a simple term, $dS(t) = S(t+d(t))$ , with $S(t)$ representing the stock price at the time $t$. logarithmicly the continuous compounding return of the stock over the interval $d(t)$ is $ r(t) = \log\left(\frac{S(t+d(t))}{S(t)}\right)$.
The volatility $\sigma$ is the square root of the variance. It provides a measure of the risk or uncertainty of the stock price. $\sigma$ is a key parameter of the Geometric Brownian motion that determines the stochastic variation of the stock price.
The variance of the stock price over a given period is a measure of the magnitude of expected price fluctuations. It is the mathematical expectation of the squared deviation between the price and its mean.
$W(t)$ is a random variable that follows a Wiener process. It is the random component of the stock price movement and is related to the variation of time d(t). The mathematical expression of this relationship is : $dW(t)=\epsilon \sqrt{dt}$. In this expression, the term "epsilon" represents a random variable whose distribution is normal with expected value of zero(mean zero) and variance equal 1. It's mathematical expection is $E(dW(t))=\sqrt{dt}E(epsilon)=0$ with the variance $Var(dW(t))=d(t)Var(epsilon)=d(t)$.
The variations of $W(t)$ are independent over time. In the case the company associated with that stock does not distribute a portion of its profits to shareholders in the form of dividend payments, the stochastic equation for the return of a stock is:$\frac{dS(t)}{S(t)} = \mu d(t) + \sigma dW(t)$. Therefore, it makes sense that the return on a stock does not depend on the price of the stock.

Let's now focus on the stochastic equation for the return. We will dig into this yow part : $\mu d(t) and \sigma dW(t)$.
The first part, $\mu d(t)$, is deterministic meaning that, using the historical data, we can calculate the expected change in the stock price over the small time interval $dt$, assuming no randomness. Essentially, $mu$ is the expected rate of return per unit time, and when multiplied by the stock price $S(t)$ and the time interval $dt$, it gives the expected change in the stock price due to predictable factors like steady growth, interest rates, or dividends.
Tthe second part, $\sigma dW(t)$, is the Stochastic or randomness part of the stock rate of return. It takes in to considaration the unpredictable fluctuations in the stock price due to various factors like market volatility, company-Specific news, or economic factors, geopolitical events,natural disasters and pandemic, investor behavior and sentiment, technological advances and disruptions, global economic interdependencies. The $dW(t)$ represents the random shock to the stock price, and $sigma$ scales this shock, making it more or less volatile. Let's look inside the solution of the stochatic equation of the stock price.

Quation1: $S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t) \right)$

or in more details

Quation2: $S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)$

In the equation2, the term $ \phi$ represent correlated normal distributions with standard deviation equal 1 and expected value of zero(mean zero). But from Geometric Brownian Motion prostective, the stock price movement is independent over time(uncorrelated) and follow a log-normal distribution with a mean of zero and a standard deviation that depends on the time interval dt. In order to come out of this situation, we will procide as follow:

  • Calculate log returns of the stock prices
  • Calculate the expected return, the variance and the volatility of each stock
  • Calculate the variance-covariance matrix
  • Calculate cholesky decomposition matrix.
  • Simulate an uncorrelated random normal distribution with $mu = 0 and sigma = 1$(Z distribution)
  • Apply the cholesky matrix to the uncorrelated random normal distribution(Z distribution) in order to get a correlated random normal normal distribution with with $mu = 0 and sigma = 1$.
  • Use correlated random normal normal distribution as input for the stock price function.

After then, we will simulate the portfolio Profit & Lost and finanly we will calculate the portfolio VaR(value at Rick and the CVaR(conditional Value at Risk)

In [11]:
# Data collection- 
cumulative_variance_treshold = 1.0
threshold_for_highest_loadings = 0.5
correlation_coefficient_treshold = 0.3                                                                    
#-------------------------------------
loadings_matrix_df, explained_variance  = setting_PCA_for_assets_selection(log_returns)
num_components = get_num_components(explained_variance,cumulative_variance_treshold)
top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
most_important_assets_list = selecting_important_item_PCA_treshold_method(top_indicators_df, threshold_for_highest_loadings) 
#------------------------------------ 
#plot_explained_variance_for_assets_selection(loadings_matrix_df, explained_variance)
#print_explained_variance(loadings_matrix_df, explained_variance,cumulative_variance_treshold, num_components, threshold_for_highest_loadings)
most_important_assets_log_returns_df_PCA = get_most_important_assets_log_returns_df_PCA(log_returns, cumulative_variance_treshold , 
                                                                                        threshold_for_highest_loadings)
            
most_important_assets_corr_matrix_PCA = get_most_important_assets_corr_matrix_PCA(log_returns, cumulative_variance_treshold,
                                                                                  threshold_for_highest_loadings )
    
most_diversify_portfolio_assets_list = selecting_important_item_corr_treshold_method(most_important_assets_corr_matrix_PCA,
                                                                                              correlation_coefficient_treshold)
    
most_diversify_portfolio_assets_log_returns_df = get_most_diversify_portfolio_asset_log_return_df_stack_corr_PCA(log_returns,
                                                                                                       most_diversify_portfolio_assets_list)

most_diversify_portfolio_assets_corr_matrix = using_PCA__and_corr_matrix_to_diversify_portfolio(log_returns, cumulative_variance_treshold ,
                                                  threshold_for_highest_loadings, correlation_coefficient_treshold )
    
selected_assets_volatility_df_stack_corr_PCA_method = \
                    get_selected_assets_volatility_df_from_Stack_Corr_PCA_method(most_diversify_portfolio_assets_log_returns_df, 
                                                                                            frequency_date_column = 'day')

most_diversify_portfolio_assets_initial_prices =  stocks_initial_prices[most_diversify_portfolio_assets_list]
#most_diversify_portfolio_assets_list

    

Correlation - Covariance & Cholesky decomposition¶

  • Covariance: Covariance measures the degree to which two variables (e.g., asset returns) move together. It tells us whether the returns of two assets tend to rise and fall together (positive covariance) or move in opposite directions (negative covariance). Zero Covariance means that there is no linear relationship between the assets' returns. A mix of assets with low or negative covariances can reduce overall portfolio risk.
  • Mathematical Formula:
    $$ Cov(X, Y) = [ \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) ] $$ Where:

    • $( X_i )$ and $( Y_i )$ are the returns of assets $(X)$ and $(Y)$.
    • $( \bar{X} )$ and $( \bar{Y} )$ are the mean returns of $(X)$ and $(Y)$.
    • \$( n )$ is the number of observations.
  • Correlation: Correlation is a normalized version of covariance, which measures the strength and direction of the linear relationship between two variables (asset returns). Unlike covariance, correlation is dimensionless and always ranges between -1 and 1. Correlation is used to measure the degree of diversification in a portfolio. Combining assets with low or negative correlations can significantly reduce portfolio risk. Portfolio managers use correlation to understand how different assets are likely to behave relative to one another under various market conditions.
  • Mathematical Formula: $$ Correlation(X, Y) = [ \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} ] $$ Where:

    • $Cov(X, Y)$ is the covariance of $( X )$ and $( Y )$.
    • $( \sigma_X )$ and $( \sigma_Y )$ are the standard deviations of $( X )$ and $( Y )$.
  • Cholesky decomposition: Cholesky decomposition is a mathematical technique used of decomposing a positive-definite matrix into the product of a lower triangular matrix and its transpose. We will use Cholesky decomposition methode to decompose the covariance matrix into the product of a lower triangular matrix and its transpose(cholesky Matrix). This will help us to generate correlated asset returns and the stock price.
  • For a positive-definite matrix $( A )$, the Cholesky decomposition is expressed as: $$ A = LL^\top $$ Where:
  • $( L )$ is a lower triangular matrix.
  • $( L^\top )$ is the transpose of $( L )$.
  • Suppose you have a covariance matrix of asset returns: $$ \Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \dots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \dots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \dots & \sigma_{nn} \end{pmatrix} $$ The Cholesky decomposition would allow you to write:
$$\Sigma = LL^\top$$

Where:

  • \$( L )$ is used to generate correlated random variables from uncorrelated normal variables, which is essential for realistic financial simulations.
In [12]:
def plotting_heatmap_for_correlation_matrix(log_returns, title):
    plt.figure(figsize=(20, 8))
    #sns.heatmap(log_returns.corr(), annot=True)
    sns.heatmap(log_returns.corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1, linewidths=0.5, fmt=".2f")
    plt.yticks(rotation=360)
    plt.title(title, pad= 20)

def plotting_heatmap_for_covariance_matrix(covariance_matrix, title):
    plt.figure(figsize=(20, 8))
    sns.heatmap(covariance_matrix, annot=True, cmap='coolwarm', fmt=".5f",
            linewidths=0.5, vmin=covariance_matrix.min().min(), vmax=covariance_matrix.max().max())
    plt.yticks(rotation=360)
    plt.title(title, pad= 20)

    
def variance_covariance_matrix(log_returns):
    return log_returns.cov() 


#-----------------------------------------------------------------------------------------------------------------------------------------------
#create cholesky matrice: let's apply cholesky decomposition to the covarience matrix 
# Input: covarience matrice
# output: cholesky matrice data frame
#------------------------------------------------------------------------------------------------------------------------------------------------
def create_cholesky_matrix(covar_mat):
    cholesky_matrix_data = np.linalg.cholesky(covar_mat)
    return pd.DataFrame(cholesky_matrix_data[0:,0:], columns=covar_mat.columns.tolist(), index=covar_mat.columns.tolist())


covar_mat = variance_covariance_matrix(most_diversify_portfolio_assets_log_returns_df) 
cholesky_matrix_data_df = create_cholesky_matrix(covar_mat)

plotting_heatmap_for_correlation_matrix(most_diversify_portfolio_assets_log_returns_df, 
                                        'Correlation Matrix of the Most Diversified portfolio Asset Log Returns')

plotting_heatmap_for_covariance_matrix(covar_mat, 'Covariance Matrix of the Most Diversified portfolio  Asset Log Returns')
plotting_heatmap_for_covariance_matrix(cholesky_matrix_data_df, 'Cholesky Matrix of the Most Diversified portfolio Asset Log Returns')

Uncorelated normal distribution¶

Uncorrelated Normal(epsilon T) Two random variables are said to be uncorrelated if their covariance is zero. Two variables that are uncorrelated are not necessarily independent, as is simply exemplified by the fact that X and X2 are uncorrelated but not independent. However, two variables that are uncorrelated AND jointly normally distributed are guaranteed to be independent https://stats.stackexchange.com/questions/376229/uncorrelatedness-joint-normality-independence-why-intuition-and-mechanics#:~:text=Two%20variables%20that%20are%20uncorrelated,are%20guaranteed%20to%20be%20independent.

https://stackoverflow.com/questions/20626994/how-to-calculate-the-inverse-of-the-normal-cumulative-distribution-function-in-p

In [13]:
#--------------------------------------------------------------------------------------------------------
# here let's simulate 10000 uncorelated normal distribution  iterations to calculate the stock price.
#input:covariance matrice and number of iteration
#output 10000 Z score for each stock price: uncorelated normal z core array and  it's  data frame 
# here we simulate 10000 uncorelated normal distribution  iterations to calculate the stock price.
#t_intervals = 250
#number_of_assets = len(covar_mat.columns.tolist())
#Z = norm.ppf(np.random.rand(iterations,number_of_assets ))
#--------------------------------------------------------------------------------------------------------
def simulate_uncorelated_normal_distribution(covar_mat,iterations):
    number_of_assets = len(covar_mat.columns.tolist())
    #z score array
    Z = norm.ppf(np.random.rand(iterations,number_of_assets ))
    Z_df = pd.DataFrame(data=Z[0:,0:],index=[i for i in range(Z.shape[0])], columns=covar_mat.columns.tolist())
    return Z,Z_df

Z, Z_df = simulate_uncorelated_normal_distribution(covar_mat,10000)
display(Z_df)
AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
0 -0.014251 0.984673 -0.640661 0.352552 -0.755954 0.140241 1.938027 0.351559 1.425819 0.566410 -1.130777 -1.427924 0.119983 -0.035151 1.431731 0.253041 -0.757660
1 -1.584557 0.132888 -0.022475 -0.359053 0.433362 0.642457 0.848172 -0.120482 0.403287 -0.235679 1.367996 -0.608351 0.223716 0.449225 0.142766 1.121109 0.280437
2 1.339836 -0.074981 0.286637 -1.427705 0.887084 1.096874 0.877216 1.518152 -0.023004 -0.807409 -0.540780 0.993497 -0.064737 -1.635076 0.120613 0.783371 -0.411260
3 2.042191 2.554160 -0.092553 -0.940519 -0.539154 0.871144 -0.814800 -1.159940 0.683020 1.625901 -0.168646 -0.418753 -0.831418 -0.267564 -0.108139 1.265697 -0.745670
4 0.460667 1.242499 -0.543992 0.617708 -0.294869 -0.574476 -1.302641 -0.514947 -0.179331 -0.729605 -0.315258 -0.551196 -0.196510 0.440043 0.884956 -0.572972 -1.587416
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 2.285962 1.143190 1.666773 0.532412 1.102743 -0.710522 -1.217958 -1.082297 -1.439800 -0.390220 -0.258851 1.963762 -0.245368 0.245116 -0.007518 -0.323493 1.080169
9996 -0.933452 1.226422 -0.991998 1.561508 -0.183356 -0.794913 -1.173983 0.004532 0.444666 -0.384535 1.112723 -1.895295 0.312740 0.413843 -1.040315 -0.258575 2.573085
9997 -0.328184 0.758332 -0.102508 0.295453 -0.330564 0.955908 0.539927 0.505346 -0.646064 -1.538100 1.222877 0.495787 -1.100979 -0.748121 -1.073875 -1.654978 1.593181
9998 -0.293886 0.107442 1.428256 -0.851825 -0.274065 -2.167839 0.613583 1.315447 -1.104972 0.858727 0.894246 1.060963 0.020778 0.757783 0.646817 -0.243252 0.293175
9999 -0.819552 -0.773844 -0.987715 -0.167020 -1.913879 -0.357072 -1.020579 0.788049 -1.103704 1.080072 -1.159892 2.201218 0.115906 0.669816 0.807538 0.674188 0.170408

10000 rows × 17 columns

Correlated normal distribution¶

In [14]:
#-----------------------------------------------------------------------------------------------------
#Description: generate correlated normal distribution using transposed cholesky matrix and uncorelated 
#normal Z score distribution
#=MMULT(unCorrelated_normal_distribution,TRANSPOSE(cholesky_matrix))
#------------------------------------------------------------------------------------------------------
def generate_correlated_normal_distribution(cholesky_matrix_data_df,Z):
    Correlated_Normals_Z = np.matmul(Z, cholesky_matrix_data_df.T)
    Correlated_Normals_Z_arr = np.array(Correlated_Normals_Z)
    return Correlated_Normals_Z, Correlated_Normals_Z_arr
    
Correlated_Normals_Z, Correlated_Normals_Z_arr= generate_correlated_normal_distribution(cholesky_matrix_data_df,Z)
display(Correlated_Normals_Z)
AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
0 -0.000433 0.023327 -0.004859 0.003156 -0.017646 0.000687 0.040785 0.004622 0.011653 0.012442 -0.001786 -0.002495 0.003362 0.007600 0.017540 0.005148 -0.008301
1 -0.048101 -0.000311 -0.002368 -0.010031 0.005826 0.014631 0.015728 -0.003324 -0.001681 -0.002503 0.013993 -0.013458 0.001131 -0.025045 0.000725 0.009813 0.002893
2 0.040672 0.001150 0.006916 -0.012391 0.018651 0.047821 0.028297 0.013226 0.008036 0.005036 0.005002 0.020782 0.004158 0.011082 0.007770 0.010744 0.000395
3 0.061993 0.065054 0.018510 0.011930 0.007635 0.043610 0.006362 0.001888 0.014788 0.019384 0.009638 0.001728 0.006743 0.046047 0.013391 0.023005 0.005917
4 0.013984 0.030482 -0.000659 0.012116 -0.002435 -0.018094 -0.027576 -0.000943 -0.000407 -0.002153 -0.007351 -0.002267 -0.003426 0.015101 0.005626 -0.004179 -0.032156
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 0.069393 0.032115 0.040559 0.049526 0.068828 0.008496 0.021225 0.011909 0.014665 0.013923 0.022675 0.033419 0.024448 0.052189 0.026274 0.026996 0.062650
9996 -0.028336 0.027053 -0.011077 0.014160 -0.006581 -0.034278 -0.035863 0.000139 -0.000937 -0.002007 0.001828 -0.012249 -0.001147 -0.012758 -0.012302 -0.006405 0.053131
9997 -0.009962 0.017272 0.002515 0.007524 -0.001475 0.032531 0.016739 0.006982 0.000924 -0.003587 0.017175 0.011131 -0.008088 -0.012801 -0.007647 -0.011772 0.031650
9998 -0.008921 0.001906 0.025102 0.008800 0.017954 -0.070304 0.037551 0.014037 0.004767 0.006670 0.024769 0.022313 0.008877 0.008923 0.019311 0.015381 0.028035
9999 -0.024879 -0.020148 -0.023700 -0.026080 -0.066007 -0.033995 -0.053334 -0.007765 -0.021662 -0.018028 -0.038104 0.009916 -0.024527 -0.017775 -0.018233 -0.020770 -0.022374

10000 rows × 17 columns

Daily returns simulation¶

$Daily returns = e^{\left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)}$

In [15]:
#--------------------------------------------------------------------------------------------------------------
#Description: Daily returns simulation (returns simulation = 𝒆^(((𝝁𝒊−(𝟏/𝟐)𝝈𝒊𝟐)(𝒕𝟐−𝒕𝟏)+𝝈𝒊√((𝒕𝟐−𝒕𝟏) ) 𝝓)))
#Inputs:
#   𝝓 : Correlated_Normals_Z
#   𝝓_arr : Correlated_Normals_Z_array
#   𝝁 : log_returns.mean()
#   variance: log_returns.var()
#   𝝈 : log_returns.std()
#   𝓢1 : initial_prices
#   delta_t = 1
#output: 
 #--------------------------------------------------------------------------------------------------------------   
def simulate_daily_returns(𝝓,𝝓_arr, 𝝁,𝝈,delta_t):
    daily_returns_list_df = np.zeros_like(𝝓_arr)
    daily_returns_list_df = np.exp((𝝁 - 0.5 * 𝝈** 2) * delta_t + 𝝈* delta_t ** 0.5 *𝝓)
    return daily_returns_list_df

 
#daily_returns_list_df = simulate_daily_returns(Correlated_Normals_Z,Correlated_Normals_Z_arr, log_returns.mean(),log_returns.std(),1)
daily_returns_df = simulate_daily_returns(Correlated_Normals_Z,Correlated_Normals_Z_arr, 
                                               most_diversify_portfolio_assets_log_returns_df.mean(),
                                               most_diversify_portfolio_assets_log_returns_df.std(),1)
daily_returns_df
Out[15]:
AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
0 1.000344 1.000797 1.000076 1.000315 0.999432 1.000139 1.001916 1.000359 1.000358 1.000242 1.000212 1.000569 1.000283 0.999917 1.000518 1.000202 1.000158
1 0.998898 1.000234 1.000122 1.000018 1.000120 1.000653 1.001140 1.000273 1.000196 1.000063 1.000489 1.000378 1.000250 0.999066 1.000237 1.000280 1.000477
2 1.001593 1.000268 1.000296 0.999965 1.000496 1.001876 1.001529 1.000451 1.000314 1.000153 1.000331 1.000976 1.000295 1.000008 1.000355 1.000296 1.000406
3 1.002242 1.001792 1.000513 1.000513 1.000173 1.001721 1.000850 1.000329 1.000396 1.000325 1.000413 1.000643 1.000333 1.000921 1.000448 1.000502 1.000563
4 1.000782 1.000968 1.000154 1.000517 0.999878 0.999448 0.999801 1.000299 1.000212 1.000067 1.000115 1.000573 1.000183 1.000113 1.000319 1.000046 0.999480
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 1.002467 1.001007 1.000925 1.001361 1.001969 1.000427 1.001310 1.000437 1.000394 1.000260 1.000641 1.001196 1.000593 1.001081 1.000663 1.000569 1.002178
9996 0.999497 1.000886 0.999959 1.000563 0.999756 0.998852 0.999545 1.000310 1.000205 1.000068 1.000276 1.000399 1.000217 0.999386 1.000020 1.000008 1.001907
9997 1.000055 1.000653 1.000213 1.000414 0.999906 1.001312 1.001171 1.000384 1.000228 1.000050 1.000545 1.000807 1.000115 0.999385 1.000097 0.999918 1.001295
9998 1.000087 1.000286 1.000636 1.000442 1.000475 0.997529 1.001816 1.000460 1.000274 1.000173 1.000678 1.001002 1.000364 0.999952 1.000547 1.000374 1.001192
9999 0.999602 0.999761 0.999723 0.999656 0.998016 0.998863 0.999006 1.000225 0.999954 0.999876 0.999576 1.000786 0.999873 0.999256 0.999921 0.999768 0.999758

10000 rows × 17 columns

In [16]:
#--------------------------------------------------------------------------------------------------------------------------
# Description: Stock price simulation
def stock_prices_simulation(initial_prices,daily_returns_list_df ):
    𝓢1_list = []
    𝓢1_list = initial_prices.values
    expo_r = daily_returns_list_df
    future_stock_price_list_df= pd.DataFrame(data=daily_returns_list_df[0:0:],
                                             index=[i for i in range(daily_returns_list_df.shape[0])],
                                             columns=expo_r.columns.tolist())
    for (index, column) in enumerate(expo_r):
        future_stock_price_list_df[column] = pd.DataFrame(data=𝓢1_list[index]*expo_r[column].values)
    return future_stock_price_list_df
  
future_stock_price_df = stock_prices_simulation(most_diversify_portfolio_assets_initial_prices,daily_returns_df) 
future_stock_price_df
Out[16]:
AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
0 6.902050 13.901071 54.475821 26.914950 20.777522 11.056767 8.764450 46.878844 37.546200 34.983736 24.397128 35.655474 14.198505 30.371363 34.527311 43.431187 32.464362
1 6.892070 13.893245 54.478358 26.906952 20.791823 11.062444 8.757666 46.874833 37.540140 34.977469 24.403878 35.648653 14.198040 30.345503 34.517628 43.434585 32.474700
2 6.910668 13.893729 54.487816 26.905521 20.799642 11.075969 8.761068 46.883187 37.544556 34.980630 24.400031 35.669959 14.198671 30.374122 34.521684 43.435263 32.472392
3 6.915142 13.914896 54.499628 26.920272 20.792926 11.074252 8.755132 46.877464 37.547626 34.986648 24.402015 35.658101 14.199210 30.401847 34.524921 43.444193 32.477492
4 6.905071 13.903441 54.480098 26.920385 20.786788 11.049125 8.745954 46.876035 37.540719 34.977615 24.394748 35.655616 14.197089 30.377308 34.520450 43.424395 32.442345
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 6.916696 13.903982 54.522101 26.943091 20.830260 11.059946 8.759154 46.882523 37.547570 34.984357 24.407593 35.677826 14.202903 30.406720 34.532342 43.447101 32.529936
9996 6.896206 13.902305 54.469488 26.921625 20.784262 11.042543 8.743715 46.876581 37.540478 34.977676 24.398674 35.649406 14.197565 30.355234 34.510128 43.422774 32.521131
9997 6.900054 13.899066 54.483332 26.917599 20.787374 11.069737 8.757940 46.880035 37.541323 34.977014 24.405240 35.663953 14.196117 30.355199 34.512808 43.418867 32.501269
9998 6.900272 13.893979 54.506346 26.918374 20.799217 11.027908 8.763574 46.883597 37.543070 34.981315 24.408489 35.670912 14.199655 30.372411 34.528331 43.438640 32.497928
9999 6.896930 13.886681 54.456634 26.897222 20.748085 11.042659 8.738995 46.872591 37.531060 34.970959 24.381598 35.663196 14.192690 30.351260 34.506714 43.412316 32.451371

10000 rows × 17 columns

Stock prices simulation - Protfolio Profit and Lost calculation¶

$S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)$

In [17]:
#Initial portfolio price
def calculate_initial_portfolio_price(𝓢1):
    return 𝓢1.values.sum()

#Portfolio price simulation
def simulated_portfolio_price(future_stock_price_df):
    simulated_portfolio_price_row_sum = []
    for i in range(len(future_stock_price_df)):
        simulated_portfolio_price_row_sum.append(future_stock_price_df.iloc[i].sum())
    simulated_portfolio_price_df = pd.DataFrame(data = simulated_portfolio_price_row_sum, columns=['portfolio_prices'])
    return simulated_portfolio_price_df

#-----------------------------------------------------------------------------------------------------------
#Description: Portfolio Profit and lost calculation
# input:simulated_portfolio_price_df,portfolio_initial_price
# output :portfolio_profit_and_lost_df
#-----------------------------------------------------------------------------------------------------------
def calculate_prtfolio_profit_and_lost(simulated_portfolio_price_df, portfolio_initial_price):
    portfolio_profit_and_lost_df = simulated_portfolio_price_df - portfolio_initial_price
    portfolio_profit_and_lost_df.columns = ['profit_&_lost']
    return portfolio_profit_and_lost_df

def set_portfolio_price_profit_and_Lost_simulation_df(simulated_portfolio_price_df, portfolio_profit_and_lost_df):
    return  pd.DataFrame({'simulated_portfolio_price': simulated_portfolio_price_df['portfolio_prices'].values,
                          'Simulated Portfolio Profit & Lost': portfolio_profit_and_lost_df['profit_&_lost'].values})
   

#index=[i for i in range(portfolio_profit_and_lost_df.shape[0])],
#                        columns=simulated_portfolio_price_df.columns, portfolio_profit_and_lost_df.columns
In [18]:
simulated_portfolio_price_df= simulated_portfolio_price(future_stock_price_df)
initial_portfolio_prices = calculate_initial_portfolio_price(most_diversify_portfolio_assets_initial_prices)
portfolio_profit_and_lost_df = calculate_prtfolio_profit_and_lost(simulated_portfolio_price_df, initial_portfolio_prices)

simulated_portfolio_price_profit_and_Lost_df = set_portfolio_price_profit_and_Lost_simulation_df(simulated_portfolio_price_df, 
                                                                                                portfolio_profit_and_lost_df)


print('initial_portfolio_prices')
display(initial_portfolio_prices)
print('\nSimulated Portfolio Prices - Profit & Lost')
display(simulated_portfolio_price_profit_and_Lost_df)
initial_portfolio_prices
477.11676597595215
Simulated Portfolio Prices - Profit & Lost
simulated_portfolio_price Simulated Portfolio Profit & Lost
0 477.246739 0.129973
1 477.197986 0.081220
2 477.314910 0.198144
3 477.391765 0.274999
4 477.197181 0.080415
... ... ...
9995 477.554099 0.437333
9996 477.209791 0.093025
9997 477.266926 0.150160
9998 477.334018 0.217252
9999 477.000961 -0.115805

10000 rows × 2 columns

Portfolio VaR (Value at Risk) and CVaR calculation¶

While VaR represents a worst-case loss associated with a probability and a time horizon, CVaR is the expected loss if that worst-case threshold is ever crossed. CVaR, in other words, quantifies the expected losses that occur beyond the VaR breakpoint. CVaR is the average loss over a specified time period of unlikely scenarios beyond the confidence level. https://www.investopedia.com/terms/c/conditional_value_at_risk.asp#:~:text=While%20VaR%20represents%20a%20worst,occur%20beyond%20the%20VaR%20breakpoint.

In [19]:
#-------------------------------------------------------------------------------------------------------------------------
# Description:sorting profit and lost ascendante; confifence level rank; Var calculation;CVar calculation
# Input: 
# Output:
#-------------------------------------------------------------------------------------------------------------------------

def calculate_portfolio_Var_and_CVar(portfolio_profit_and_lost_df, confidence_level):
    #sorting profit and lost ascendante
    lportfolio_profit_and_lost_df = portfolio_profit_and_lost_df.sort_values(by='profit_&_lost', ascending=True)
    lportfolio_profit_and_lost_df = portfolio_profit_and_lost_df.reset_index(drop=True)
    #confifence level rank ( 95% confidence lavel)
    rank = int((1-confidence_level)*len(lportfolio_profit_and_lost_df))-1
    #Var calculation
    VaR = portfolio_profit_and_lost_df.iloc[rank]['profit_&_lost']
    #CVar calculation
    port_folio_lost_beyond_VaR = portfolio_profit_and_lost_df[:rank]
    CVaR = np.average(port_folio_lost_beyond_VaR)
    return VaR, CVaR
 
#--------------------------------------------------------------------------------------------------------------------------
#Decription: Profit and lost summary statistics. Minimum lost, maximum lost, mean(moderate lost)lost standart deviation, 
#            Value at risk(Var),Conditional value-at-risk (CVaR)
#Input :portfolio_profit_and_lost_df,VaR,CVaR
#Output:
#--------------------------------------------------------------------------------------------------------------------------
def profit_and_lost_summary_statistics(portfolio_profit_and_lost_df,VaR,CVaR): 
    VaR_and_CVaR_df = pd.DataFrame([{'VaR':VaR, 'CVaR':CVaR}]).transpose()
    VaR_and_CVaR_df = VaR_and_CVaR_df.rename(columns={0:'profit_&_lost'})
    portfolio_profit_and_lost_stat_df =  portfolio_profit_and_lost_df.agg(['min', 'max', 'mean', 'std'])
    return pd.concat([portfolio_profit_and_lost_stat_df,VaR_and_CVaR_df], ignore_index=False)

# Plot a histogram
def profit_lost_summary(portfolio_profit_and_lost_df):   
    fig, ax = plt.subplots(figsize=(8, 4))
    portfolio_profit_and_lost_df.plot.kde(ax=ax, legend=True, title='Histogram: Profit & Lost')
    portfolio_profit_and_lost_df.plot.hist(density=True, ax=ax)
    ax.set_ylabel('Probability')
    ax.grid(axis='y')
    ax.set_facecolor('#d8dcd6')
    plt.show()

def summary_statistics_graph_and_table(portfolio_profit_and_lost_df):
    profit_lost_summary(portfolio_profit_and_lost_df)      
    profit_and_lost_summary_statistics_df = profit_and_lost_summary_statistics(portfolio_profit_and_lost_df,VaR,CVaR)
    display(profit_and_lost_summary_statistics_df)

    
VaR, CVaR = calculate_portfolio_Var_and_CVar(portfolio_profit_and_lost_df, 0.95)
profit_and_lost_summary_statistics_df = profit_and_lost_summary_statistics(portfolio_profit_and_lost_df,VaR,CVaR) 
summary_statistics_graph_and_table(portfolio_profit_and_lost_df)
profit_&_lost
min -0.379296
max 0.550121
mean 0.102792
std 0.135651
VaR 0.146044
CVaR 0.094615

Portfolio Optimization¶

Portfolio Expected Return and Volatility Simulation - Random Efficient Frontier¶

In [20]:
def portfolio_random_weight_array_df(assets_returns_df):
    #random portfolio weigh simulation
    number_of_assets = len(assets_returns_df.columns.tolist())
    random_array = np.random.rand(1,number_of_assets )
    random_array_df = pd.DataFrame(random_array, columns = assets_returns_df.columns.tolist())
    random_weight_df = random_array_df/random_array_df.values.sum()
    return random_weight_df
    
def portfolio_expected_Return(random_weight_df,log_returns):
    assets_expected_returns = log_returns.mean()
    weited_expected_returns = assets_expected_returns * random_weight_df
    portfolio_expected_return_ = weited_expected_returns.values.sum()
    return 100*portfolio_expected_return_

def  portfolio_volatility(varcovar,w):
    transpose_w = w.T
    σp = np.sqrt(np.matmul(np.matmul(w,varcovar),w.T))
    return 100*σp[0][0]
In [21]:
def efficient_frontiere_plot(portfolio_trails_simulation_df):
    display(portfolio_trails_simulation_df)
    #fig, ax = plt.subplots()
    portfolio_trails_simulation_df.plot(x='σp', y='E_rp', kind='scatter', figsize=(10, 6));
    plt.xlabel('Expected Volatility')
    plt.ylabel('Expected Return')
    plt.title('Random portfolios Efficient Frontier')
    
#efficient_frontiere_plot(portfolio_trails_simulation_df)
In [22]:
def generate_excess_return(log_returns_df):
    𝝁 = log_returns_df.mean()
    𝝁_list = []
    𝝁_list = 𝝁.values
    X_df= pd.DataFrame(data=log_returns_df[0:0:],
                       index=log_returns_df.index.to_list(), #[i for i in range(log_returns_df.shape[0])],
                       columns=log_returns_df.columns.tolist())
    
    assets_list = log_returns_df.columns.tolist()
    for index in range(len(assets_list)):   
            X_df[assets_list[index]] =  log_returns_df[assets_list[index]].values - 𝝁_list[index]
    return X_df

#Portfolio Statistics
def portfolio_arihtmetics(log_returns_df,index_adj_close_price_df):
    return pd.DataFrame({'mu expected_return':log_returns_df.mean(),
                        'variance':log_returns_df.var(),
                        'Sigmas(volatilities)':log_returns_df.std(),
                        'modifiy shape(Er)/𝝈':log_returns_df.mean()/log_returns_df.std(),
                        'initial price':index_adj_close_price_df.iloc[0]}).transpose()


def excess_return_varcovar(X_df): 
    return X_df.cov()

def get_uncorrelated_assets_index_adj_close_price_df(index_adj_close_price_df, uncorrelated_assets_list):
    return index_adj_close_price_df[uncorrelated_assets_list]

def uncorrelated_assets_arithmetics_summary(index_adj_close_price_df, log_returns, 
                                            most_diversify_portfolio_assets_list, top_modify_shape_ratio):
    
    #uncorrelated_assets_index_adj_close_price_df = get_uncorrelated_assets_index_adj_close_price_df(index_adj_close_price_df, 
    #                                                                selecting_uncorrelated_assets(log_returns,threshold))
    
    uncorrelated_assets_index_adj_close_price_df = get_uncorrelated_assets_index_adj_close_price_df(index_adj_close_price_df, 
                                                                     most_diversify_portfolio_assets_list)
    
    #uncorrelated_assets_log_returns_df = uncorrelated_assets_returns_log_returns_df(log_returns, selecting_uncorrelated_assets(log_returns,threshold))
    uncorrelated_assets_log_returns_df = uncorrelated_assets_returns_log_returns_df(log_returns, most_diversify_portfolio_assets_list)
    
    portfolio_arihtmetics_df = portfolio_arihtmetics(uncorrelated_assets_log_returns_df, uncorrelated_assets_index_adj_close_price_df)
    portfolio_arihtmetics_df_T = portfolio_arihtmetics_df.transpose()
    portfolio_arihtmetics_df_T = portfolio_arihtmetics_df_T.sort_values(by='modifiy shape(Er)/𝝈',ascending=False)
    modify_shape_ratio_sort_assets_ticker_list = portfolio_arihtmetics_df_T.index.tolist()
    uncorrelated_assets_index_adj_close_price_by_return_df = get_uncorrelated_assets_index_adj_close_price_df(index_adj_close_price_df, 
                                                                                                         modify_shape_ratio_sort_assets_ticker_list)
     
    return portfolio_arihtmetics_df_T, uncorrelated_assets_index_adj_close_price_by_return_df
#--------------------------------------------
In [23]:
#------execution time approx 5mn
def uncorelated_portfolio_trals_simulation(log_returns, most_diversify_portfolio_assets_list, trial):
    
    σp_list = []
    E_rp_list = []
    random_weight_array_df_rows_list = []
    excess_return_df = generate_excess_return(log_returns[most_diversify_portfolio_assets_list])
    
    for i in  range(0, trial):         
        random_weight_array_df = portfolio_random_weight_array_df(uncorrelated_assets_returns_log_returns_df(log_returns, 
                                                                                        most_diversify_portfolio_assets_list))
        
        random_weight_array_df_rows_list.append(random_weight_array_df)
        
        E_rp_list.append(portfolio_expected_Return(random_weight_array_df,uncorrelated_assets_returns_log_returns_df(log_returns,
                                                                                               most_diversify_portfolio_assets_list)))
        
        σp_list.append(portfolio_volatility(excess_return_varcovar(excess_return_df),random_weight_array_df))
    
    uncorelated_portfolio_trails_simulation_df =  pd.DataFrame({'σp':σp_list,'E_rp':E_rp_list}, index=[i for i in range(0,trial)])
    σp = uncorelated_portfolio_trails_simulation_df['σp']
    E_rp = uncorelated_portfolio_trails_simulation_df['E_rp']
    sharpes_rat = E_rp/σp
    uncorelated_portfolio_trails_simulation_sharpes_ratio_df = pd.DataFrame({'σp':σp,'E_rp':E_rp,'sharpes_ratio':sharpes_rat})
    
    random_weight_array_all_rows_df = pd.concat(random_weight_array_df_rows_list, axis=0,ignore_index=True)
    uncorelated_weighted_portfolio_trails_simulation_df = uncorelated_portfolio_trails_simulation_sharpes_ratio_df.merge(random_weight_array_all_rows_df, 
                                                                                                                         left_index=True, right_index=True)
    
    return uncorelated_portfolio_trails_simulation_df,uncorelated_portfolio_trails_simulation_sharpes_ratio_df, \
                                        random_weight_array_all_rows_df,uncorelated_weighted_portfolio_trails_simulation_df
In [24]:
uncorelated_portfolio_trails_simulation_df,uncorelated_portfolio_trails_simulation_sharpes_ratio_df, random_weight_array_all_rows_df, \
                                   uncorelated_weighted_portfolio_trails_simulation_df = \
                                   uncorelated_portfolio_trals_simulation(log_returns, most_important_assets_list, 10000)

  
X_df =generate_excess_return(most_diversify_portfolio_assets_log_returns_df)
Excess_return_varcovar = excess_return_varcovar(X_df)
display(Excess_return_varcovar)
efficient_frontiere_plot(uncorelated_weighted_portfolio_trails_simulation_df)
AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
AGI 0.000922 0.000066 0.000054 0.000106 0.000042 0.000170 0.000091 0.000062 0.000095 0.000094 0.000083 0.000099 0.000085 0.000645 0.000079 0.000057 0.000170
ATS 0.000066 0.000568 0.000157 0.000192 0.000203 0.000073 0.000211 0.000097 0.000111 0.000108 0.000133 0.000130 0.000101 0.000120 0.000137 0.000142 0.000192
BMO 0.000054 0.000157 0.000350 0.000318 0.000383 0.000156 0.000384 0.000129 0.000177 0.000172 0.000247 0.000181 0.000207 0.000120 0.000251 0.000269 0.000290
BN 0.000106 0.000192 0.000318 0.000508 0.000412 0.000187 0.000377 0.000165 0.000208 0.000203 0.000263 0.000260 0.000243 0.000158 0.000275 0.000277 0.000333
BTO 0.000042 0.000203 0.000383 0.000412 0.000859 0.000255 0.000451 0.000173 0.000216 0.000210 0.000300 0.000212 0.000301 0.000124 0.000311 0.000330 0.000377
CIX 0.000170 0.000073 0.000156 0.000187 0.000255 0.001355 0.000207 0.000097 0.000122 0.000120 0.000137 0.000125 0.000148 0.000182 0.000149 0.000132 0.000165
CNQ 0.000091 0.000211 0.000384 0.000377 0.000451 0.000207 0.000955 0.000169 0.000236 0.000227 0.000387 0.000214 0.000271 0.000186 0.000324 0.000332 0.000375
CWB 0.000062 0.000097 0.000129 0.000165 0.000173 0.000097 0.000169 0.000116 0.000098 0.000093 0.000113 0.000156 0.000094 0.000099 0.000117 0.000111 0.000152
DOL 0.000095 0.000111 0.000177 0.000208 0.000216 0.000122 0.000236 0.000098 0.000147 0.000141 0.000155 0.000152 0.000137 0.000129 0.000162 0.000159 0.000182
DOO 0.000094 0.000108 0.000172 0.000203 0.000210 0.000120 0.000227 0.000093 0.000141 0.000144 0.000149 0.000144 0.000137 0.000126 0.000158 0.000154 0.000180
ENB 0.000083 0.000133 0.000247 0.000263 0.000300 0.000137 0.000387 0.000113 0.000155 0.000149 0.000307 0.000150 0.000179 0.000129 0.000208 0.000219 0.000243
IGM 0.000099 0.000130 0.000181 0.000260 0.000212 0.000125 0.000214 0.000156 0.000152 0.000144 0.000150 0.000304 0.000138 0.000137 0.000169 0.000157 0.000202
PEY 0.000085 0.000101 0.000207 0.000243 0.000301 0.000148 0.000271 0.000094 0.000137 0.000137 0.000179 0.000138 0.000216 0.000107 0.000186 0.000191 0.000220
SIL 0.000645 0.000120 0.000120 0.000158 0.000124 0.000182 0.000186 0.000099 0.000129 0.000126 0.000129 0.000137 0.000107 0.000681 0.000123 0.000106 0.000214
SLF 0.000079 0.000137 0.000251 0.000275 0.000311 0.000149 0.000324 0.000117 0.000162 0.000158 0.000208 0.000169 0.000186 0.000123 0.000278 0.000223 0.000260
TD 0.000057 0.000142 0.000269 0.000277 0.000330 0.000132 0.000332 0.000111 0.000159 0.000154 0.000219 0.000157 0.000191 0.000106 0.000223 0.000281 0.000258
WFG 0.000170 0.000192 0.000290 0.000333 0.000377 0.000165 0.000375 0.000152 0.000182 0.000180 0.000243 0.000202 0.000220 0.000214 0.000260 0.000258 0.000809
σp E_rp sharpes_ratio AGI ATS BMO BN BTO CIX CNQ CWB DOL DOO ENB IGM PEY SIL SLF TD WFG
0 1.463924 0.049216 0.033619 0.061478 0.028326 0.071094 0.031965 0.040477 0.001287 0.097337 0.106726 0.082775 0.069108 0.086733 0.056751 0.010179 0.098261 0.041318 0.016895 0.099291
1 1.456257 0.050297 0.034539 0.095308 0.051838 0.052191 0.090941 0.042232 0.079520 0.032087 0.006965 0.065352 0.064341 0.103519 0.051785 0.037692 0.024174 0.028138 0.104734 0.069183
2 1.519654 0.050381 0.033153 0.043578 0.099466 0.025888 0.102048 0.060712 0.003116 0.070548 0.019659 0.064289 0.085615 0.062783 0.048497 0.033337 0.001310 0.105518 0.072775 0.100860
3 1.447017 0.055641 0.038452 0.114172 0.100494 0.025847 0.012265 0.012440 0.125598 0.011601 0.027697 0.046409 0.047930 0.027334 0.119180 0.001083 0.066607 0.100469 0.045745 0.115127
4 1.590675 0.052442 0.032969 0.026852 0.037283 0.023290 0.119195 0.077515 0.118204 0.071668 0.047571 0.004109 0.101898 0.096280 0.009467 0.016165 0.006370 0.022246 0.111510 0.110377
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 1.450366 0.041744 0.028781 0.021174 0.046835 0.024910 0.028640 0.113835 0.070572 0.031758 0.000404 0.054368 0.109426 0.076364 0.057101 0.113578 0.065853 0.097676 0.070423 0.017085
9996 1.499805 0.049326 0.032888 0.030985 0.096667 0.077471 0.043811 0.086070 0.098646 0.093373 0.023148 0.083306 0.106571 0.064479 0.006410 0.063945 0.015442 0.068136 0.016169 0.025370
9997 1.466001 0.053268 0.036335 0.077047 0.034728 0.060395 0.092088 0.034346 0.014701 0.080867 0.105010 0.019764 0.073559 0.046928 0.090525 0.029031 0.035272 0.059986 0.047293 0.098459
9998 1.368392 0.046324 0.033853 0.074358 0.023944 0.005396 0.048464 0.042889 0.100645 0.024404 0.077556 0.044457 0.085507 0.087827 0.101470 0.099017 0.075380 0.019033 0.081825 0.007829
9999 1.480896 0.051103 0.034508 0.064211 0.022347 0.059236 0.106488 0.073483 0.030267 0.028149 0.072220 0.000783 0.036406 0.042473 0.135150 0.101605 0.069185 0.049160 0.012325 0.096511

10000 rows × 20 columns

In [25]:
#--------------------------------------------------Efficient Frontiere Optimal Points-----------------------------------------------
# get data frame top1 portfolio
# selecting the optimal portfolios:portfolios with expected return higher or equal the minimun risky portfolio
#sort the optimal portfolio data frame by selected value:ascending=True
#return the data frame
#---------------------------------------------------------------------------------------------------------------------------------
def efficient_frontiere_selected_sharpe_ratio_portfolio_df(uncorrelated_weighted_portfolio_trails_simulation_df,selected_col): 
    
    uncorrelated_weighted_portfolio_trails_simulation_sorted_df = uncorrelated_weited_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    uncorrelated_weighted_portfolio_trails_simulation_sorted_df = uncorrelated_weited_portfolio_trails_simulation_sorted_df.reset_index(drop=True)

    top1_sharpe_ratio_value = uncorrelated_weighted_portfolio_trails_simulation_sorted_df['sharpes_ratio'].values[0]
    top1_E_rp_value= uncorrelated_weighted_portfolio_trails_simulation_sorted_df['E_rp'].values[0]
    top1_σp_value = uncorrelated_weighted_portfolio_trails_simulation_sorted_df['σp'].values[0]

    # selecting the optimal portfolios:portfolios with expected return higher or equal the minimun risky portfolio
    if selected_col == 'sharpes_ratio':
        uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
        uncorrelated_weighted_portfolio_trails_simulation_sorted_df[uncorrelated_weighted_portfolio_trails_simulation_sorted_df[selected_col] >= top1_sharpe_ratio_value] 
    elif selected_col == 'E_rp':
        uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_sorted_df
                
    # sort the optimal portfolio data frame
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df.sort_values(by='σp', ascending=True)
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df.reset_index(drop=True)
    return uncorelated_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df

#----------------------------------------------------------------------------------------
def efficient_frontiere_optimal_sharpe_ratio_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points = 35):
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', 
                                                                                                                        ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorelated_portfolio_trails_simulation_sharpes_ratio_top_df =portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    xpoints_list = []
    ypoints_list = []
    top_sharpe_ratio_value_points_list = []

    for portfolio_number in range(number_of_top_points):
        #top shape ratio    
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[portfolio_number])

    xpoints = np.array(xpoints_list)
    ypoints = np.array(ypoints_list)
    top_sharpe_ratio_value_points = np.array(top_sharpe_ratio_value_points_list)

    return xpoints, ypoints, top_sharpe_ratio_value_points

def get_maximun_return_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='E_rp', 
                                                                                                                                ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)

    max_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[0]  
    max_E_rp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[0]
    max_E_rp_σp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[0]

    return max_E_rp_sharpe_ratio, max_E_rp, max_E_rp_σp
 
def get_maximun_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    # here  the portfolios are sotrted from maximum risk
    portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = \
                                uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=False)
    portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = \
                                        portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.reset_index(drop=True)

    max_σp_E_rp_sharpe_ratio = portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['sharpes_ratio'].values[0]                                                                                                                  
    max_σp_E_rp = portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['E_rp'].values[0]
    max_σp = portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['σp'].values[0]
    
    return max_σp_E_rp_sharpe_ratio, max_σp_E_rp, max_σp

def get_minimum_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    # here  the portfolios are sotrted from minimum risk 
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = \
                                            uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.reset_index(drop=True)
    minimun_σp_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[0]
    minimun_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[0]
    minimun_σp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[0]
    return minimun_σp_E_rp_sharpe_ratio, minimun_σp_E_rp, minimun_σp

def get_maximum_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df):
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = \
                                        uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    maximum_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[0]
    maximum_sharpe_ratio_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[0]
    maximum_sharpe_ratio_σp = portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[0]
    return maximum_sharpe_ratio, maximum_sharpe_ratio_σp_E_rp, maximum_sharpe_ratio_σp


#-----------------------------------------------------------------------------------------------------------------------------------
def efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points = 35):
      
    #number_of_top_points = 35   
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorelated_portfolio_trails_simulation_sharpes_ratio_top_df =portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    
    # minimum risk portfolio: here  the portfolios are sotrted from minimum risk 
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.head(number_of_top_points)
    
    minimun_σp_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[0]
    minimun_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[0]
    minimun_σp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[0]
    
    # maximun return portfolio
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='E_rp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.head(number_of_top_points)
    
    max_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[0]  
    max_E_rp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[0]
    max_E_rp_σp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[0]
    
    # maximun risk portfolio: here  the portfolios are sotrted from maximum risk
    portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=False)
    portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.reset_index(drop=True)
    portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = portfolio_trals_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.head(number_of_top_points)
    
    
        
    xpoints_list = []
    ypoints_list = []
    top_sharpe_ratio_value_points_list = []
    
    for portfolio_number in range(number_of_top_points):
        
        #top shape ratio    
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[portfolio_number])
        
        # minimum risk portfolio:
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[portfolio_number])
        
        # maximun return portfolio
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[portfolio_number])  
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[portfolio_number])
        
     
    xpoints = np.array(xpoints_list)
    ypoints = np.array(ypoints_list)
    top_sharpe_ratio_value_points = np.array(top_sharpe_ratio_value_points_list)
    
    
    return xpoints.clip(minimun_σp,max_E_rp_σp),ypoints.clip(minimun_σp_E_rp,max_E_rp), \
                    top_sharpe_ratio_value_points.clip(minimun_σp_E_rp_sharpe_ratio,max_E_rp_sharpe_ratio)

def get_maximun_minimum_points(df):
    
    # maximun return portfolio
    max_df = df.sort_values(by='E_rp', ascending=False)
    max_df = max_df.reset_index(drop=True)
    max_df = max_df.head(1)
    max_E_rp_σp = max_df['σp']
    max_E_rp    = max_df['E_rp']
    max_E_rp_sharpe_ratio = max_df['sharpes_ratio']
    
    # minimum return portfolio
    min_df = df.sort_values(by='E_rp', ascending=True)
    min_df = min_df.reset_index(drop=True)
    min_df = min_df.head(1)
    minimun_σp = min_df['σp']
    minimun_σp_E_rp    = min_df['E_rp']
    minimun_σp_E_rp_sharpe_ratio = min_df['sharpes_ratio']
    
    return max_E_rp_σp, max_E_rp, max_E_rp_sharpe_ratio, minimun_σp, minimun_σp_E_rp, minimun_σp_E_rp_sharpe_ratio

#-----------------------------------------Efficient Frontiere Model Plotting-----------------------------------------------------------------------------------
#call efficient_frontiere_optimal_portfolios_df to include sharpe ration dataframe to the trails protfolios dataframe
#and select the optimal portfolios 
#prepare data for plotting and create the scatter plot
#include sharpe ration dataframe to the trails protfolios dataframe
#select the optimal portfolios(portfolios with expected return higher or equal to the minimumal risk portfolio)
#sorted by sharpe ration efficient_frontiere_selected_sharpe_ratio_portfolio_df
#------------------------------------------------------------------------------------------------------------------------------
def plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label, marker, color ):
    #points plotting
       
    xpoints,ypoints,top_sharpe_ratio_value_points = \
                efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,7)
    
    row, col = uncorelated_weighted_portfolio_trails_simulation_df.shape
    #--model definition---
    mymodel = np.poly1d(np.polyfit(xpoints, ypoints,2))
    popt = np.polyfit(xpoints, ypoints,2)
    a, b, c = popt
    poly_d2_form = str('y =%.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
    display(np.polyfit(xpoints, ypoints,2))
    myline = np.linspace(xpoints.min(), xpoints.max(), row) 
    
    # optimal portfolios plotting
    ypred = mymodel(myline)
    ax.plot(xpoints,ypoints,'*',color='red',label='Optimal portfolios')
    ax.plot(myline, mymodel(myline),'.',color="blue",label=label + ':\n'+poly_d2_form)
    print(r2_score(ypoints, mymodel(xpoints)))
    
def plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax, colorbar = 'yes'):
   
    #random portfolio plotting
    optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_df
    sharpes_ratio_optimal_portfolios_σp_col = optimal_portfolios_df['σp']
    sharpes_ratio_optimal_portfolios_E_rp_col = optimal_portfolios_df['E_rp']
    optimal_portfolios_sharpes_ratio_col = optimal_portfolios_df['sharpes_ratio']
                                                                 
    scplt = ax.scatter(sharpes_ratio_optimal_portfolios_σp_col, sharpes_ratio_optimal_portfolios_E_rp_col, marker="o",
                       c=optimal_portfolios_sharpes_ratio_col, cmap="viridis",label='Random Portfolios')
    if colorbar == 'yes':
        cb = fig.colorbar(scplt, ax=ax, label='Sharpe Ratio')
    ax.set_title("Towards an Efficient Frontier Model - Random portfolios Efficient Frontier")

    
def plot_fitted_curve_and_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df):
    fig, ax =plt.subplots(figsize=(12, 5))                                                               
    plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label='Model to Approximate', marker= '*', color='red')                                                           
    plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax)                                                             
    ax.legend(prop = { "size": 8 })   

#plot_fitted_curve_and_random_portfolios(uncorelated_weighted_portfolio_trails_simulation_df)
In [26]:
#----------------------------------------------------------------------------------
# minimum risk portfolio: here  the portfolios are sotrted from minimum risk 
#-----------------------------------------------------------------------------------
def portfolio_strategy_minimum_risk(uncorelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = uncorelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='σp', ascending=True)
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df.head(number_of_top_points)
    
    portfolio_weight_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df[most_diversify_portfolio_assets_list]
    
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_stickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Stickers':asset_stickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Stickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_Stickers = portfolio_investment_strategy_df['Asset Stickers']
    
    return strategy_Weight, strategy_Stickers, portfolio_trails_simulation_minimum_risk_σp_E_rp_df
    
#----------------------------------------------------------------------------
#maximun risk portfolio: here  the portfolios are sotrted from minimum risk
#----------------------------------------------------------------------------
def portfolio_strategy_maximun_risk(uncorelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    #log_returns,threshold
    portfolio_trails_simulation_max_risk_σp_E_rp_df = uncorelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='σp', ascending=False)
    portfolio_trails_simulation_max_risk_σp_E_rp_df = portfolio_trails_simulation_max_risk_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_max_risk_σp_E_rp_df = portfolio_trails_simulation_max_risk_σp_E_rp_df.head(number_of_top_points)
    portfolio_weight_df = portfolio_trails_simulation_max_risk_σp_E_rp_df[most_diversify_portfolio_assets_list]
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_stickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Stickers':asset_stickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Stickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_Stickers = portfolio_investment_strategy_df['Asset Stickers']
    
    return strategy_Weight, strategy_Stickers, portfolio_trails_simulation_max_risk_σp_E_rp_df

#---------------------------------
# maximun return portfolio
#--------------------------------
def portfolio_strategy_maximun_return(uncorelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='E_rp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.head(number_of_top_points)
    
    portfolio_weight_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df[most_diversify_portfolio_assets_list]
    
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_stickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Stickers':asset_stickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Stickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_Stickers = portfolio_investment_strategy_df['Asset Stickers']
    
    return strategy_Weight, strategy_Stickers, portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df

#-------------------------------------------------------------------------
#sort from maximum sharpe ration and get top sharpe ratio portfolios
#-------------------------------------------------------------------------
def portfolio_strategy_top_sharpe_ratio(uncorelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    
    portfolio_weight_df = uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df[most_diversify_portfolio_assets_list]
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_stickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Stickers':asset_stickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Stickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_Stickers = portfolio_investment_strategy_df['Asset Stickers']
    
    return strategy_Weight, strategy_Stickers, uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df


#------------------------------------------------------------------------
#sort from maximum sharpe ratio and get top sharpe ratio portfolios
#------------------------------------------------------------------------
def portfolio_strategy_plotting(uncorelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    fig, ax =plt.subplots(2,2, figsize=(14, 10))
    
    strategy_Weight, strategy_Stickers,uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df = \
        portfolio_strategy_top_sharpe_ratio(uncorelated_weighted_portfolio_trails_simulation_df, number_of_top_points)
   
    bar_container= ax[0,0].barh(strategy_Weight,strategy_Stickers)
    # setting label of y-axis
    ax[0,0].set_ylabel("Asset Stickers")
    # setting label of x-axis
    #ax[0,0].set_xlabel("Portfolio Weight") 
    ax[0,0].set_title("Maximum Sharpe Ratio Portfolio Assets Allocation")
    ax[0,0].bar_label(bar_container, fmt='{:,.0f}%')
   
    
    # maximun return portfolio
    strategy_Weight, strategy_Stickers,portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = \
        portfolio_strategy_maximun_return(uncorelated_weighted_portfolio_trails_simulation_df, number_of_top_points) 
    bar_container= ax[0,1].barh(strategy_Weight,strategy_Stickers)
 
    # setting label of y-axis
    ax[0,1].set_ylabel("Asset Stickers")
    # setting label of x-axis
    #ax[0,1].set_xlabel("Portfolio Weight") 
    ax[0,1].set_title("Maximun Return Portfolio Assets Allocation")
    ax[0,1].bar_label(bar_container, fmt='{:,.0f}%')
    
     # maximun risk portfolio: here  the portfolios are sotrted from minimum risk
    strategy_Weight, strategy_Stickers,portfolio_trails_simulation_max_risk_σp_E_rp_df = \
        portfolio_strategy_maximun_risk(uncorelated_weighted_portfolio_trails_simulation_df, number_of_top_points) 
    
    bar_container= ax[1,0].barh(strategy_Weight,strategy_Stickers)
    # setting label of y-axis
    ax[1,0].set_ylabel("Asset Stickers")
    # setting label of x-axis
    #ax[1,0].set_xlabel("Portfolio Weight") 
    ax[1,0].set_title("Maximun risk Portfolio Assets Allocation")
    ax[1,0].bar_label(bar_container, fmt='{:,.0f}%')
    
    # minimum risk portfolio: here  the portfolios are sotrted from maximun risk 
    strategy_Weight, strategy_Stickers,portfolio_trails_simulation_minimum_risk_σp_E_rp_df = \
        portfolio_strategy_minimum_risk(uncorelated_weighted_portfolio_trails_simulation_df, number_of_top_points)
    
    bar_container= ax[1,1].barh(strategy_Weight,strategy_Stickers)
    # setting label of y-axis
    ax[1,1].set_ylabel("Asset Stickers")
    # setting label of x-axis
    #ax[1,1].set_xlabel("Portfolio Weight") 
    ax[1,1].set_title("Minimum risk Portfolio Assets Allocation")
    ax[1,1].bar_label(bar_container, fmt='{:,.0f}%')
    
    plt.show()

plot_fitted_curve_and_random_portfolios(uncorelated_weighted_portfolio_trails_simulation_df)    
portfolio_strategy_plotting(uncorelated_weighted_portfolio_trails_simulation_df,10) 
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373

Efficient Frontier modelling using Machine Learning technics¶

Data Splitting / Model Selection¶

In [27]:
# Data Splitting / Model Selection
def polynomial_degree2_model(uncorelated_weighted_portfolio_trails_simulation_df):
     # Load the data : original random portfolios data points
    xpoints, ypoints, original_random_sharpe_ratio = \
    efficient_frontiere_optimal_portfolios_model_points( uncorelated_weighted_portfolio_trails_simulation_df)
    
    #original_random_portfolios_df = pd.DataFrame({'σp':xpoints,'E_rp':ypoints,'sharpes_ratio':original_random_sharpe_ratio})
    
    # Build the model
    def model_poly_d2(x, a, b, c):
        return b * x**2 + a * x + c
    
    #Split tranning, validation and testing data  
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2 = \
    train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d2, x_model_testing_poly_d2 = train_test_split(np.linspace(min(xpoints), max(xpoints), 
                                                                                       len(xpoints)), test_size=0.3, random_state=42)
   
    # model traning to get paarameters
    popt_poly_d2, pcov_poly_d2 = curve_fit(model_poly_d2, x_train_poly_d2,y_train_poly_d2)   
    a, b, c = popt_poly_d2
    
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
    
    return x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form

#x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
#                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
#                                            polynomial_degree2_model(uncorelated_weighted_portfolio_trails_simulation_df)
In [28]:
def polynomial_degree3_log_model(uncorelated_weighted_portfolio_trails_simulation_df):
    # Load the data : original random portfolios data points
    xpoints,ypoints,original_random_sharpe_ratio = efficient_frontiere_optimal_portfolios_model_points(uncorelated_weighted_portfolio_trails_simulation_df)
    
    #original_random_portfolios_df = pd.DataFrame({'σp':xpoints,'E_rp':ypoints,'sharpes_ratio':original_random_sharpe_ratio})
    
    # Build the model
    def model_poly_d3_log(x, a, b, c, d, e):
        return a * np.log(abs(b )* x) + c*x**3 +d*x**2 + e
    
    #Split tranning and testing data  
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log = \
                                                    train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d3_log, x_model_testing_poly_d3_log = \
                                train_test_split(np.linspace(min(xpoints), max(xpoints), len(xpoints)), test_size=0.3, random_state=42)
    
    # model validation data
    #x_model_validation = np.linspace(min(x_train), max(x_train), number_of_top_points*3) 
    
    # model traning to get poarameters
    popt_poly_d3_log, pcov_poly_d3_log = curve_fit(model_poly_d3_log, x_train_poly_d3_log,y_train_poly_d3_log)   
    a, b, c, d, e = popt_poly_d3_log 
    
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (a, b, c, d, e))

    
    return x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form

#x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
#                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
#                                                            polynomial_degree3_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
In [29]:
def polynomial_degree5_log_model(uncorelated_weighted_portfolio_trails_simulation_df):
     # Load the data : original random portfolios data points
    xpoints,ypoints,original_random_sharpe_ratio = efficient_frontiere_optimal_portfolios_model_points(uncorelated_weighted_portfolio_trails_simulation_df)
    
    # Build the model
    def model_poly_d5_log(x, a, b, c):
        return a*np.log(abs(b)*x) + c*x**5
    
    #Split tranning and testing data  
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log = \
                                        train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d5_log, x_model_testing_poly_d5_log = \
                                train_test_split(np.linspace(min(xpoints), max(xpoints), len(xpoints)), test_size=0.3, random_state=42)

    popt_poly_d5_log, pcov_poly_d5_log = curve_fit(model_poly_d5_log, x_train_poly_d5_log,y_train_poly_d5_log)   
    a, b, c = popt_poly_d5_log
    
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (a, b, c))
    
    return x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form

#x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
#                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
#                polynomial_degree5_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
In [30]:
def models_plotting(x, y, uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax, model_form):                                                                                 
    #------Random portfolio data plotting
    plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax,'no')
    cspl = ax.scatter(x=x, y=y, c=y/x, cmap="viridis",label='Efficient Frontier:\n'+model_form)
    #-----------model to approximate
    plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label='Fitted Curve', marker= '*', color='red')
    
    ax.legend(bbox_to_anchor=(0.72, 1.38), ncol=1, prop = { "size": 8})
    return cspl
In [31]:
def dataframe_clipping(x_σp, y_E_rp, y_E_rp_pred ):
    
    clipped_df = pd.DataFrame({'σp':x_σp,'E_rp':y_E_rp,'y_E_rp_pred':y_E_rp_pred,'error':y_E_rp_pred - y_E_rp})
    clipped_df = clipped_df.sort_values(by='error',ascending=False)
    clipped_df['y_optimal_E_rp'] = np.where(clipped_df['E_rp'] <= clipped_df['y_E_rp_pred'], clipped_df['E_rp'],clipped_df['y_E_rp_pred'] )
    clipped_df['sharpes_ratio'] = clipped_df['y_optimal_E_rp']/clipped_df['σp']
    return clipped_df[clipped_df['error'] >= 0]
    
def model_uperBound_efficient_frontier( uncorrelated_weighted_portfolio_trails_simulation_df, model, model_popt, 
                                       ax , mode_form, random_points = 0):
    
    optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_df
    x_σp = uncorrelated_weighted_portfolio_trails_simulation_df['σp']
    y_E_rp = uncorelated_weighted_portfolio_trails_simulation_df['E_rp']
    row, col = uncorelated_weighted_portfolio_trails_simulation_df.shape
    
    #here the orriginal data frame is clipped to eliminate the upper bound Outlier 
    y_E_rp_pred = model(x_σp, *model_popt)
    clipped_df = dataframe_clipping(x_σp, y_E_rp, y_E_rp_pred )
     
    xpoints,ypoints,top_sharpe_ratio_value_points = efficient_frontiere_optimal_portfolios_model_points(clipped_df,7) 
     
    #------Random portfolio data plotting
    if random_points == 0:
        
        scplt = ax.scatter(clipped_df['σp'], clipped_df['E_rp'], marker="o", c=clipped_df['E_rp']/clipped_df['σp'], 
                       cmap="viridis",label='Random Portfolios')
        
    else:        
        xrandom_points,yrandom_points,random_sharpe_ratio_value_points = \
                        efficient_frontiere_optimal_sharpe_ratio_portfolios_model_points(clipped_df,random_points)
        scplt = ax.scatter(x=xrandom_points, y=yrandom_points, marker="o", c= random_sharpe_ratio_value_points, 
                       cmap="viridis",label='Random Portfolios')
        
    
    #efficient frontier plotting 
    x_model_σp = np.linspace(xpoints.min(), xpoints.max(), row)
    y_model_E_rp_pred = model(x_model_σp, *model_popt)
    cspl = ax.scatter(x=x_model_σp, y=y_model_E_rp_pred, marker="*", c= y_E_rp_pred/x_model_σp,
                      cmap="viridis",label='Efficient Frontier:\n'+mode_form)
     
    ax.set_title("Boundary Random portfolios Efficient Frontier")
    ax.legend(bbox_to_anchor=(0.72, 1.38), ncol=1, prop = { "size": 8})
        
    return scplt

Model Evaluation¶

Model validation¶

we validate the model(model parameters) on the training and the validation data.

In [32]:
def evalute_model_parameters(uncorelated_weighted_portfolio_trails_simulation_df):
    
    #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                                            polynomial_degree2_model(uncorelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c = popt_poly_d2
    #model prediction
    y_model_validation_pred_poly_d2 = model_poly_d2(x_model_validation_poly_d2, a, b, c)
      
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                                            polynomial_degree3_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c, d, e = popt_poly_d3_log
    y_model_validation_pred_poly_d3_log = model_poly_d3_log(x_model_validation_poly_d3_log, a, abs(b), c, d, e) 
        
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c = popt_poly_d5_log
    y_model_validation_pred_poly_d5_log = model_poly_d5_log(x_model_validation_poly_d5_log, a,abs(b), c)
    

    return popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
           x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
           y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
           model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form
       
#popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
#x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
#y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
#model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
#                                        evalute_model_parameters(uncorelated_weighted_portfolio_trails_simulation_df)
In [33]:
def model_validation_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
     
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True) 
    
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorelated_weighted_portfolio_trails_simulation_df)

    cspl1 = models_plotting(x_model_validation_poly_d2, y_model_validation_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_model_validation_poly_d3_log, y_model_validation_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1],poly_d3_log_form)
    cspl = models_plotting(x_model_validation_poly_d5_log, y_model_validation_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form)
    cplt4 =  model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2,
                                       ax[1,1], poly_d2_form)
    
    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_validation_plotting(uncorelated_weighted_portfolio_trails_simulation_df)    
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373

Model fine-tuning¶

In [34]:
def fine_tune_hyperparmeters(uncorelated_weighted_portfolio_trails_simulation_df):
    
     #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                             polynomial_degree2_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    y_model_turning_pred_poly_d2 = model_poly_d2(x_model_validation_poly_d2, 0.075, -0.019, -0.007)
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (0.07, -0.016, -0.009))
    
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                         polynomial_degree3_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    y_model_tuning_pred_poly_d3_log = model_poly_d3_log(x_model_validation_poly_d3_log,  0.256, 0.348, 0.00793, -0.060, 0.343) 
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (0.256, 0.348, 0.00793, -0.060, 0.343))
    
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    y_model_turning_pred_poly_d5_log = model_poly_d5_log(x_model_validation_poly_d5_log,  0.085, 1.44, -0.00058)
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (0.085, 1.44, -0.00058))
       
    return model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
           x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_turning_pred_poly_d2, y_train_poly_d3_log, \
           y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_turning_pred_poly_d5_log, poly_d2_form, \
           poly_d3_log_form, poly_d5_log_form
    
#model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, x_model_validation_poly_d3_log, \
#x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
#y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
#poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorelated_weighted_portfolio_trails_simulation_df)
    
In [35]:
def model_tuning_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True) 
    
    print(" Models Fine-tuning ")

    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorelated_weighted_portfolio_trails_simulation_df)

    cspl1 = models_plotting(x_model_validation_poly_d2, y_model_tuning_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_model_validation_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1], poly_d3_log_form)
    cspl = models_plotting(x_model_validation_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form) 
    cplt4 = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       model_poly_d2,popt_poly_d2, ax[1,1], poly_d2_form,7000)
    
    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_tuning_plotting(uncorelated_weighted_portfolio_trails_simulation_df)
 Models Fine-tuning 
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373

Model Testing¶

In [36]:
def test_the_model(uncorelated_weighted_portfolio_trails_simulation_df):
     #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                             polynomial_degree2_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    y_model_test_pred_poly_d2 = model_poly_d2(x_test_poly_d2, 0.07, -0.016, -0.009)
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (0.07, -0.016, -0.009))
    
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                         polynomial_degree3_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    y_model_test_pred_poly_d3_log = model_poly_d3_log(x_test_poly_d3_log,  0.256, 0.348, 0.00793, -0.060, 0.343) 
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (0.256, 0.348, 0.00793, -0.060, 0.343))
    
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorelated_weighted_portfolio_trails_simulation_df)
     
    y_model_test_pred_poly_d5_log = model_poly_d5_log(x_test_poly_d5_log,  0.085, 1.44, -0.00058)
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (0.085, 1.44, -0.00058))
       
    return model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
           y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
           y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form

model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorelated_weighted_portfolio_trails_simulation_df)      
In [37]:
def model_testing_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):  
    
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True)  
    
    print(" Model Testing ")
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorelated_weighted_portfolio_trails_simulation_df)
                                                                                   
    cspl1 = models_plotting(x_test_poly_d2, y_model_test_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_test_poly_d3_log, y_model_test_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1], poly_d3_log_form)
    cspl = models_plotting(x_test_poly_d5_log, y_model_test_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form) 
    cplt4 = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, \
                                        model_poly_d2,popt_poly_d2, ax[1,1], poly_d2_form)

    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_testing_plotting(uncorelated_weighted_portfolio_trails_simulation_df)
 Model Testing 
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373
array([-0.28822396,  0.89777456, -0.63458208])
0.9341898618645373

Goodness of Fit Statistics¶

In [38]:
def error_metrics_statistics(y_true_0, y_pred_0,y_true_1, y_pred_1,y_true_2, y_pred_2,  poly_d2_form, poly_d3_log_form, poly_d5_log_form ):
    
    display('Poly_d2 : '+poly_d2_form)
    display('Poly_d3_log: '+poly_d3_log_form)
    display('Poly_d5_log: '+poly_d5_log_form)
    
    error_metrics_table = [['Type Error', 'Poly_d2 Error', 'Poly_d3_log Error','Poly_d5_log Error'], 
         ['Mean Absolute Error(MAE)', mean_absolute_error(y_true_0, y_pred_0),mean_absolute_error(y_true_1, y_pred_1),mean_absolute_error(y_true_2, y_pred_2)],
         ['Mean Absolute Percentage Error(MAPE)', mean_absolute_percentage_error(y_true_0, y_pred_0),mean_absolute_percentage_error(y_true_1, y_pred_1),mean_absolute_percentage_error(y_true_2, y_pred_2)],
         ['Neg.Mean Squared Error(RMSE)', -mean_squared_error(y_true_0, y_pred_0),-mean_squared_error(y_true_1, y_pred_1),-mean_squared_error(y_true_2, y_pred_2)],
         ['R-squared score', r2_score(y_true_0, y_pred_0),r2_score(y_true_1, y_pred_1),r2_score(y_true_2, y_pred_2)],
         ['Mean Squared Error(MSE)',mean_squared_error(y_true_0, y_pred_0),mean_squared_error(y_true_1, y_pred_1),mean_squared_error(y_true_2, y_pred_2)],
         ['Mean Squared Log Error(MSLE)', mean_squared_log_error(y_true_0, y_pred_0),mean_squared_log_error(y_true_1, y_pred_1),mean_squared_log_error(y_true_2, y_pred_2)]]
             
    return error_metrics_table
In [39]:
def model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred):
    
    validation_residual = y_train - y_model_validation_pred
    residuals_tuning_train = y_train - y_model_tuning_pred
    residuals_test = y_test - y_model_test_pred
    return validation_residual, residuals_tuning_train, residuals_test
    
def model_residual_plotting(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred, ax, title):
    
    validation_residual, residuals_tuning_train, residuals_test = \
                                    model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred)
    
    
    sns.scatterplot(ax=ax,x=y_model_validation_pred, y=validation_residual, label='Validation')
    sns.scatterplot(ax=ax,x=y_model_tuning_pred, y=residuals_tuning_train, label='Tuning')
    sns.scatterplot(ax=ax,x=y_model_test_pred, y=residuals_test, label='Test')
    
    ax.hlines(0, min(y_model_validation_pred), max(y_model_validation_pred), colors='r', linestyles='dashed')
    ax.hlines(0, min(y_model_tuning_pred), max(y_model_tuning_pred), colors='r', linestyles='dashed') 
    ax.hlines(0, min(y_model_test_pred), max(y_model_test_pred), colors='r', linestyles='dashed')
    
    ax.set_xlabel('Predicted Values')
    ax.set_ylabel('Residuals')
    ax.set_title(title)
In [40]:
def error_distribution(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred, ax, title):
    #residual calculation
    validation_residual, residuals_tuning_train, residuals_test = \
                                    model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred)
    
    # Calculate errors
    error_validation = -1*validation_residual
    tuning_error_train = -1*residuals_tuning_train
    error_test = -1*residuals_test 

    # Plot error distribution

    sns.histplot(ax=ax, x=error_validation, kde=True, label='Validation errors', color='blue')
    sns.histplot(ax=ax, x=tuning_error_train, kde=True, label='Tuning errors', color='orange')
    sns.histplot(ax=ax, x=error_test, kde=True, label='Test errors', color='green')
    
    ax.set_xlabel('Error')
    ax.set_ylabel('Frequency')
    ax.set_title('poly_d2 Error distribution')
    ax.legend()
    
   
In [41]:
def residual_and_error_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    fig, ax =plt.subplots(2,3,figsize=(17, 17)) 
    #model validation
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorelated_weighted_portfolio_trails_simulation_df)
    # Model Fine-tuning    
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
    x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorelated_weighted_portfolio_trails_simulation_df)
        
    
    # Model Testing
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                                    test_the_model(uncorelated_weighted_portfolio_trails_simulation_df)
   
    #-------------------------------------residual plotting---------------------------------------------------------------
    #model validation
       
    # poly_d2_residual
    model_residual_plotting(y_train_poly_d2, y_model_validation_pred_poly_d2, y_model_tuning_pred_poly_d2, 
                        y_test_poly_d2, y_model_test_pred_poly_d2, ax[0,0],'Poly_d2')
    #poly_d3_log
    model_residual_plotting(y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                        y_test_poly_d3_log, y_model_test_pred_poly_d3_log, ax[0,1],'poly_d3_log')
    #poly_d5_log
    model_residual_plotting(y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                        y_test_poly_d5_log, y_model_test_pred_poly_d5_log, ax[0,2],'poly_d5_log')
    
    #-----------error plotting---------------------------------------------------------------------------------------------
    # poly_d2_residual
    error_distribution(y_train_poly_d2, y_model_validation_pred_poly_d2, y_model_tuning_pred_poly_d2, 
                        y_test_poly_d2, y_model_test_pred_poly_d2, ax[1,0],'Poly_d2')
    #poly_d3_log
    error_distribution(y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                        y_test_poly_d3_log, y_model_test_pred_poly_d3_log, ax[1,1],'poly_d3_log')
    #poly_d5_log
    error_distribution(y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                        y_test_poly_d5_log, y_model_test_pred_poly_d5_log, ax[1,2],'poly_d5_log')
    
In [42]:
def model_evalution_report(uncorrelated_weighted_portfolio_trails_simulation_df):

    print(" Model Validation ")   
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorelated_weighted_portfolio_trails_simulation_df)
 
    print(tabulate(error_metrics_statistics(y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, 
                y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form), headers='firstrow',
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    print(" Model Fine-tuning ") 
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
    x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorelated_weighted_portfolio_trails_simulation_df)
        
    print(tabulate(error_metrics_statistics(y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log,
                        y_model_tuning_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form), headers='firstrow',
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    print(" Model Testing ")
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorelated_weighted_portfolio_trails_simulation_df)
    

    print(tabulate(error_metrics_statistics(y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, 
                        y_model_test_pred_poly_d5_log,  poly_d2_form, poly_d3_log_form, poly_d5_log_form  ), headers='firstrow', 
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    residual_and_error_plotting(uncorelated_weighted_portfolio_trails_simulation_df)
    

Model Evalution Report¶

In [43]:
 model_evalution_report(uncorelated_weighted_portfolio_trails_simulation_df)
 Model Validation 
'Poly_d2 : y =1.07801 * x^2 + -0.35063 * x + -0.76641'
'Poly_d3_log: y =2.11695 * np.log( 0.87254*x) + 0.30024 * x**3 + -1.13850 * x + 1.03991'
'Poly_d5_log: y =0.36785 * np.log( 0.94175*x) + -0.00860 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │      0.00888175 │         0.00889831  │         0.00887809  │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │      0.172354   │         0.172944    │         0.172191    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │     -0.0001423  │        -0.000145037 │        -0.000141573 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │     -1.03709    │        -1.07626     │        -1.02668     │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │      0.0001423  │         0.000145037 │         0.000141573 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │      0.00012861 │         0.000131169 │         0.000127932 │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛
 Model Fine-tuning 
'Poly_d2 : y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'Poly_d3_log: y =0.25600 * np.log( 0.34800*x) + 0.00793 * x**3 + -0.06000 * x + 0.34300'
'Poly_d5_log: y =0.08500 * np.log( 1.44000*x) + -0.00058 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │     0.0081338   │         0.0115017   │         0.00804915  │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │     0.175796    │         0.237246    │         0.16459     │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │    -0.000119847 │        -0.000192583 │        -0.000102487 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │    -0.715665    │        -1.75691     │        -0.467142    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │     0.000119847 │         0.000192583 │         0.000102487 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │     0.000108135 │         0.000172594 │         9.24266e-05 │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛
 Model Testing 
'Poly_d2 : y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'Poly_d3_log: y =0.25600 * np.log( 0.34800*x) + 0.00793 * x**3 + -0.06000 * x + 0.34300'
'Poly_d5_log: y =0.08500 * np.log( 1.44000*x) + -0.00058 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │     0.00436791  │         0.00903178  │         0.00325997  │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │     0.0967441   │         0.177512    │         0.0716103   │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │    -4.6615e-05  │        -9.55576e-05 │        -2.57461e-05 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │     0.310725    │        -0.412967    │         0.619305    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │     4.6615e-05  │         9.55576e-05 │         2.57461e-05 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │     4.23611e-05 │         8.57629e-05 │         2.346e-05   │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛

Winning Model¶

In [44]:
def get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df):
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, \
    x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, y_test_poly_d2, y_model_test_pred_poly_d2, \
    y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= test_the_model(uncorelated_weighted_portfolio_trails_simulation_df)
    
    return model_poly_d2, popt_poly_d2, poly_d2_form

Prediction when the investor's risk level metric (portfolio standard deviation) is known¶

Here we will use the wining efficient frontier model to predict the portfolio expected return. Then will calculater the portfolio weightsand and investment strategy The following 2 Strategies will be implemented to manage the volatility:

  1. Asset Allocation: Adjusting the proportion of different asset classes in a portfolio to balance risk.
  2. Diversification: Spreading investments across various sectors.
In [45]:
def predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, risk):
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    display(poly_d2_form)
    return model_poly_d2(risk, *popt_poly_d2)
    
    
pred_portfolio_expected_return  = predict_portfolio_expectded_return(uncorelated_weighted_portfolio_trails_simulation_df, 1.3)
   
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [46]:
def get_assets_expected_returns_and_stickers(log_returns, most_diversify_portfolio_assets_list):
    #uncorrelated_assets_log_returns = uncorrelated_assets_returns_log_returns_df(log_returns, selecting_uncorrelated_assets(log_returns,threshold))
    uncorrelated_assets_log_returns = uncorrelated_assets_returns_log_returns_df(log_returns, most_diversify_portfolio_assets_list)
    uncorrelated_assets_expected_return = uncorrelated_assets_log_returns.mean()
    #display(uncorrelated_assets_espected_return)
    #type(uncorrelated_assets_espected_return)
    assets_ticker_list = uncorrelated_assets_expected_return.index.tolist()
    #display(assets_ticker_list)
    assets_expected_returns_list = uncorrelated_assets_expected_return.to_list()
    return assets_expected_returns_list, assets_ticker_list

assets_expected_returns_list, assets_ticker_list = get_assets_expected_returns_and_stickers(log_returns,most_diversify_portfolio_assets_list)
In [47]:
def get_portfolio_investment_strategy_df( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                         most_diversify_portfolio_assets_list, portfolio_risk):

    sum_weight_and_portfolio_return_list = []
       
    #stocks expected return
    assets_expected_returns_list, assets_ticker_list = get_assets_expected_returns_and_stickers(log_returns,most_diversify_portfolio_assets_list)
    assets_expected_returns_list = np.array(assets_expected_returns_list)*100
    assets_expected_returns_list = list(np.round(assets_expected_returns_list, 3))
    
    #predicted portfolio expected return, given the portfolio volatility(risk)
    portfolio_return_predicted_value= round(predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk),3)
    
    #assets expected return absolute deviation from the portfolio expected return
    assets_expected_return_absolute_deviation_list = abs(portfolio_return_predicted_value - assets_expected_returns_list)
    assets_expected_return_absolute_deviation_list = list(np.round(assets_expected_return_absolute_deviation_list, 3))
    sum_expected_return_absolute_deviation = round(sum(assets_expected_return_absolute_deviation_list),3)
    
    #assets weight coefficients list
    assets_weight_list = assets_expected_return_absolute_deviation_list/sum_expected_return_absolute_deviation
    assets_weight_list = list(np.round(assets_weight_list, 3))
    
    #include the index content into the portfolio strategy data frame
    portfolio_content_df = index_content_df[index_content_df['Ticker'].isin(assets_ticker_list)]                   
    
    #portfolio strategy data frame
    portfolio_investment_strategy_df = pd.DataFrame({'Ticker':assets_ticker_list,'Weight':assets_weight_list,
                                                     'Asset Espected Returns':assets_expected_returns_list})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    
    #merge content data frame and the weght data frame
    portfolio_investment_strategy_df = pd.merge(portfolio_content_df, portfolio_investment_strategy_df, how="inner", on=["Ticker"])
   
    return portfolio_investment_strategy_df


portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( log_returns, 
                                                    uncorelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, 1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [48]:
def portfolio_annotation(x, y, text, ax):
    # Loop for annotation of all points 
    for i in range(len(x)): 
        ax.annotate(text[i]+'(σp='+str(round(x[i],3))+';E_rp='+ str(round(y[i],3))+')',
                    xy=(x[i], y[i]),xycoords='data', xytext= (x[i], y[i] ))  
  
In [49]:
def plotting_selected_efficient_frontier_predicted_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df,risk):
    fig, ax =plt.subplots(figsize=(12, 5))
    text = ["A", "B", "C", "D", "E", "F"] 
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    predicted_return = model_poly_d2(risk, *popt_poly_d2)
    ax.plot(risk, predicted_return,'*',color='red',label='Optimal portfolios')
    scplt = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2, ax, poly_d2_form)
    portfolio_annotation(risk, predicted_return, text, ax)
    cb = fig.colorbar(scplt, ax=ax, label='Sharpe Ratio')
In [50]:
def plot_investment_strategy_pie_chart(log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
 
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    
    industry_labels = portfolio_investment_strategy_df['Industry'].values
    sector_labels = portfolio_investment_strategy_df['Sector'].values
    weight_values = portfolio_investment_strategy_df['Weight'].values
    
    # Create subplots: use 'domain' type for Pie subplot
    fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
    
    fig.add_trace(go.Pie(labels=industry_labels, values=weight_values, name="Industry",
                        legendgroup="Industry",  # this can be any string, not just "group"
                        legendgrouptitle_text="Industry"), 1, 1)
    fig.add_trace(go.Pie(labels=sector_labels, values=weight_values, name="Sector",
                        legendgroup="Sector",  # this can be any string, not just "group"
                        legendgrouptitle_text="Sector"), 1, 2)
    
    

    # Use `hole` to create a donut-like pie chart
    fig.update_traces(hole=.5, hoverinfo="label+percent+name")

    fig.update_layout(
    title_text= risk_profile+" Suggested Investment by Industry & Sector",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Industry', x=0.14, y=0.5, font_size=20, showarrow=False),
                 dict(text='Sector', x=0.84, y=0.5, font_size=20, showarrow=False)],
    height=500, 
    width=800,
    autosize=True,
    margin=dict(t=0, b=0, l=50, r=0),
    legend_tracegroupgap = 0,
    legend=dict(    
                    orientation="v",
                    yanchor="bottom",
                    y=0,
                    xanchor="right",
                    x=1.5),
     title=dict(
                    y=0.9,
                    x=0.1,
                    xanchor= 'left',
                    yanchor= 'top'))
    
    fig.show()
In [51]:
def plot_asset_return_pie_chart(log_returns, uncorelated_weighted_portfolio_trails_simulation_df,
                                most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
 
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Espected Returns',ascending=True)
    
    industry_labels = portfolio_investment_strategy_df['Industry'].values
    sector_labels = portfolio_investment_strategy_df['Sector'].values
    weight_values = portfolio_investment_strategy_df['Asset Espected Returns'].values
    
    # Create subplots: use 'domain' type for Pie subplot
    fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
    
    fig.add_trace(go.Pie(labels=industry_labels, values=weight_values, name="Industry",
                        legendgroup="Industry",  # this can be any string, not just "group"
                        legendgrouptitle_text="Industry"), 1, 1)
    fig.add_trace(go.Pie(labels=sector_labels, values=weight_values, name="Sector",
                        legendgroup="Sector",  # this can be any string, not just "group"
                        legendgrouptitle_text="Sector"), 1, 2)

    # Use `hole` to create a donut-like pie chart
    fig.update_traces(hole=.5, hoverinfo="label+percent+name")

    fig.update_layout(
    title_text=risk_profile+" Asset Returns by Industry & Sector",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Industry', x=0.14, y=0.5, font_size=20, showarrow=False),
                 dict(text='Sector', x=0.84, y=0.5, font_size=20, showarrow=False)],
    height=500, 
    width=800,
    autosize=True,
    margin=dict(t=0, b=0, l=50, r=0),
    legend_tracegroupgap = 0,
    legend=dict(    
                    orientation="v",
                    yanchor="bottom",
                    y=0,
                    xanchor="right",
                    x=1.5),
     title=dict(
                    y=0.9,
                    x=0.1,
                    xanchor= 'left',
                    yanchor= 'top'))
    
    fig.show()
In [52]:
#Finding weights of portfolio when return given
def plot_asset_return( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                      most_diversify_portfolio_assets_list, portfolio_risk,risk_profile = ''):
    fig, ax =plt.subplots(figsize=(12, 6))
      
    #plotting Asset Espected Returns
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Espected Returns',ascending=True)
    column_list = [':      ' for i in range(len(portfolio_investment_strategy_df))]
    column_df = pd.DataFrame({'colum': column_list})
    
    asset_return = portfolio_investment_strategy_df['Asset Espected Returns']
    strategy_Stickers = portfolio_investment_strategy_df['Sector'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Industry'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Company'] + \
                        column_df['colum'] + portfolio_investment_strategy_df['Ticker'] 
    
    bar_container= ax.barh(strategy_Stickers, asset_return*100)
    ax.axes.get_xaxis().set_visible(False)
    # setting label of y-axis
    ax.set_ylabel("Asset Tickers")
    # setting label of x-axis
    ax.set_xlabel("Asset Return") 
    ax.set_title(risk_profile+" Asset Return",fontsize=22,  horizontalalignment='right',fontweight='roman')
    ax.bar_label(bar_container, fmt='{:,.1f}%')
    
    
    plt.show()
    #Asset return pie chart
    plot_asset_return_pie_chart( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                most_diversify_portfolio_assets_list, portfolio_risk, risk_profile)  
In [53]:
#Finding weights of portfolio when return given
def plot_predicted_portfolio_weight( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                    most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
    fig, ax =plt.subplots(figsize=(12, 6))
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    column_list = [':      ' for i in range(len(portfolio_investment_strategy_df))]
    column_df = pd.DataFrame({'colum': column_list})
   
    #plotting
    display(portfolio_investment_strategy_df.style.hide(axis='index'))
    
    strategy_Weight = portfolio_investment_strategy_df['Weight']
    strategy_Stickers = portfolio_investment_strategy_df['Sector'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Industry'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Company'] + \
                        column_df['colum'] + portfolio_investment_strategy_df['Ticker'] 
    bar_container= ax.barh(strategy_Stickers, strategy_Weight*100)
   
    ax.axes.get_xaxis().set_visible(False)
    #setting label of y-axis
    ax.set_ylabel("Asset Stickers")
    # setting label of x-axis
    ax.set_xlabel("Portfolio Weight") 
    ax.set_title(risk_profile+" suggested Portfolio Allocation", fontsize=22, horizontalalignment='right')
    ax.bar_label(bar_container, fmt='{:,.1f}%')
        
    plt.show()
    
    #Investement strategy pie chart
    plot_investment_strategy_pie_chart( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk, risk_profile)
      
In [54]:
def get_portolio_risk_input(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk):
    predited_portfolio_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    prediction_df = pd.DataFrame([{'portfolio_risk':portfolio_risk,'Predited Portfolio Return':predited_portfolio_return}])
    display(prediction_df.style.hide(axis='index'))
In [55]:
def risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk = 1.3):
    #define threshold to track the investor risk tolerence:High risk tolerance (aggressive investors), Moderate risk tolerance (moderate investors)
    #Low risk tolerance (conservative investors)
    pred_random_portfolio_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    
    max_E_rp_sharpe_ratio, max_E_rp, max_E_rp_σp = get_maximun_return_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximun_return_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, max_E_rp_σp)
    
    max_σp_E_rp_sharpe_ratio, max_σp_E_rp, max_σp = get_maximun_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximun_risk_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, max_σp)

    maximum_sharpe_ratio, maximum_sharpe_ratio_σp_E_rp, maximum_sharpe_ratio_σp =  get_maximum_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximum_sharpe_ratio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, maximum_sharpe_ratio_σp)

    minimum_σp_E_rp_sharpe_ratio, minimum_σp_E_rp, minimum_σp = get_minimum_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_minimum_risk_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, minimum_σp)
    avg_risk = uncorrelated_weighted_portfolio_trails_simulation_df['σp'].mean()
    pred_avg_risk_Expected_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, avg_risk)
    
    index = ['A', 'B', 'C', 'D', 'E', 'F'] 
    risk_tolerence_threshold_df =pd.DataFrame({'Portfolio Type': ['Random Portfolio', 'Maximun Return Portfolio','Maximun Risk Portfolio',
                                                                  'Maximum Sharpe Ratio(Tangent Portfolio)', 
                                              'Minimum Risk Portfolio', 'Average Volatilty'],
                              'Predicted Expected Return': [pred_random_portfolio_return, pred_maximun_return_portfolio, pred_maximun_risk_portfolio, 
                                                            pred_maximum_sharpe_ratio, pred_minimum_risk_portfolio, pred_avg_risk_Expected_return ],
                              'Portfolio Risk(volatility)':[portfolio_risk, max_E_rp_σp, max_σp, maximum_sharpe_ratio_σp, minimum_σp, avg_risk],          
                              'Sharpe Ratio':[pred_random_portfolio_return/portfolio_risk, pred_maximun_return_portfolio/maximum_sharpe_ratio_σp, 
                                              pred_maximun_risk_portfolio/max_σp, pred_maximum_sharpe_ratio/maximum_sharpe_ratio_σp ,
                                              pred_minimum_risk_portfolio/ minimum_σp, pred_avg_risk_Expected_return/avg_risk]},
                                           index=index)
    
    return risk_tolerence_threshold_df
In [56]:
def plot_risk_tolerence_treshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk = 1.3):
    risk_tolerence_threshold_df = risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)

    print('\n                                              **********************************************************************\n'+
              '                                               Optimal Portfolio Table - Winning Model and Efficient Frontier\n'+
              '                                              **********************************************************************\n')
    display(risk_tolerence_threshold_df)
    portfolio_risk_values =  risk_tolerence_threshold_df['Portfolio Risk(volatility)'].values
    plotting_selected_efficient_frontier_predicted_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df,portfolio_risk_values)
 
In [57]:
def plot_suggested_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk):
    
    risk_tolerence_threshold_df =  risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    
    for i in range(len(risk_tolerence_threshold_df)):
        portfolio_risk = risk_tolerence_threshold_df['Portfolio Risk(volatility)'][i]
        predicted_expected_return = risk_tolerence_threshold_df['Predicted Expected Return'][i]
        sharpe_ratio = risk_tolerence_threshold_df['Sharpe Ratio'][i]
        print('\n                                    *************************************\n'+
              '                                      Portfolio Risk(volatility)  : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)  
In [58]:
def risk_tolerence_encoding(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    risk_tolerence_threshold_df = risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df)
    max_Erp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['B']
    max_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['C']
    max_shape_ratiop = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['D']
    min_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['E']
    avg_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['F'] 
    #σp	E_rp
    
    simulated_risk_list =  uncorelated_weighted_portfolio_trails_simulation_df['σp']
    pred_Expected_return_list = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, simulated_risk_list)
    sharpe_ratio_list =pred_Expected_return_list/simulated_risk_list
    
    risk_profile_list = []
    risk_profile_encoding_list = []
    
    for i in range(len(simulated_risk_list)):
        portfolio_risk = simulated_risk_list[i]
        if portfolio_risk >= max_shape_ratiop and portfolio_risk <=avg_riskp :
            risk_profile_list.append('Moderate')
            risk_profile_encoding_list.append(1)
        elif portfolio_risk <max_shape_ratiop:
            risk_profile_list.append('Conservative')
            risk_profile_encoding_list.append(2)            
        elif portfolio_risk  > avg_riskp:
            risk_profile_list.append('Aggressive')
            risk_profile_encoding_list.append(3)
            
    risk_tolerence_rating_df= pd.DataFrame({'Simulated Risk': simulated_risk_list, 'Predicted Expected Return':pred_Expected_return_list,
                                            'Sharpe Ratio':sharpe_ratio_list, 'Risk Profile':risk_profile_list, 
                                            'Risk Profile Encoding Value':risk_profile_encoding_list})           
    return  risk_tolerence_rating_df     
        
risk_tolerence_encoding_df = risk_tolerence_encoding(uncorelated_weighted_portfolio_trails_simulation_df)
display(risk_tolerence_encoding_df)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Simulated Risk Predicted Expected Return Sharpe Ratio Risk Profile Risk Profile Encoding Value
0 1.463924 0.060298 0.041189 Conservative 2
1 1.456257 0.059883 0.041121 Conservative 2
2 1.519654 0.062076 0.040849 Conservative 2
3 1.447017 0.059329 0.041001 Conservative 2
4 1.590675 0.061184 0.038464 Aggressive 3
... ... ... ... ... ...
9995 1.450366 0.059537 0.041049 Conservative 2
9996 1.499805 0.061692 0.041134 Conservative 2
9997 1.466001 0.060404 0.041203 Conservative 2
9998 1.368392 0.052185 0.038136 Conservative 2
9999 1.480896 0.061070 0.041239 Conservative 2

10000 rows × 5 columns

In [59]:
def get_risk_profile_matrix(uncorelated_weighted_portfolio_trails_simulation_df):
    risk_tolerence_encoding_df = risk_tolerence_encoding(uncorelated_weighted_portfolio_trails_simulation_df)
    risk_profile_matrix =  risk_tolerence_encoding_df.groupby('Risk Profile')[['Simulated Risk','Predicted Expected Return','Sharpe Ratio']].mean()
    return pd.DataFrame(risk_profile_matrix)
   
risk_profile_matrix = get_risk_profile_matrix(uncorelated_weighted_portfolio_trails_simulation_df)
risk_profile_matrix
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Out[59]:
Simulated Risk Predicted Expected Return Sharpe Ratio
Risk Profile
Aggressive 1.610688 0.060046 0.037308
Conservative 1.455506 0.058903 0.040447
In [60]:
def plot_suggested_risk_profile_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                    most_diversify_portfolio_assets_list, portfolio_risk=1.3):
    
    #risk_tolerence_threshold_df =  risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    risk_profile_matrix = get_risk_profile_matrix(uncorelated_weighted_portfolio_trails_simulation_df)
    
    print('\n                                              **********************************************************************\n'+
              '                                                    Investment Profile Simulation And Portfolio Allocation \n'+
              '                                              **********************************************************************\n')
    display(risk_profile_matrix)
    for i in range(len(risk_profile_matrix)):
        risk_profile = risk_profile_matrix.index[i]
        portfolio_risk = risk_profile_matrix['Simulated Risk'][i]
        predicted_expected_return = risk_profile_matrix['Predicted Expected Return'][i]
        sharpe_ratio = risk_profile_matrix['Sharpe Ratio'][i]
        print('\n                                    *****************************************************\n'+
              '                                      Risk Profile                : '+risk_profile+' Investment \n'+
              '                                      Simulated Risk              : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *****************************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                          most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)  
In [61]:
plot_risk_tolerence_treshold(uncorelated_weighted_portfolio_trails_simulation_df)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                              **********************************************************************
                                               Optimal Portfolio Table - Winning Model and Efficient Frontier
                                              **********************************************************************

Portfolio Type Predicted Expected Return Portfolio Risk(volatility) Sharpe Ratio
A Random Portfolio 0.042446 1.300000 0.032651
B Maximun Return Portfolio 0.061299 1.587529 0.038613
C Maximun Risk Portfolio 0.050754 1.717820 0.029546
D Maximum Sharpe Ratio(Tangent Portfolio) 0.061299 1.587529 0.038613
E Minimum Risk Portfolio 0.034486 1.256201 0.027452
F Average Volatilty 0.059935 1.457167 0.041131
In [62]:
plot_suggested_portfolio_structure( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                   most_diversify_portfolio_assets_list, 1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.3
                                      Predicted Expected Return   : 0.042
                                      Sharpe Ratio                : 0.033
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.006000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.011000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.014000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.017000 0.036000
BMO Bank of Montreal Financial Services Banks 0.022000 0.034000
BN Brookfield Corporation Financial Services Asset Management 0.022000 0.050000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.022000 0.034000
ATS ATS Corporation Industrials Industrial Products 0.028000 0.052000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.036000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.044000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.072000 0.016000
IGM IGM Financial Inc. Financial Services Asset Management 0.094000 0.076000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.099000 0.006000
CIX CI Financial Corp. Financial Services Asset Management 0.102000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.105000 0.080000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.110000 0.082000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.196000 0.113000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.588
                                      Predicted Expected Return   : 0.061
                                      Sharpe Ratio                : 0.039
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.020000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.024000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.033000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.039000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.041000 0.080000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.046000 0.040000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.046000 0.082000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.050000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.052000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.054000 0.036000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.059000 0.034000
BMO Bank of Montreal Financial Services Banks 0.059000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.070000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.076000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.098000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.113000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.120000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.718
                                      Predicted Expected Return   : 0.051
                                      Sharpe Ratio                : 0.03
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.003000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.003000 0.050000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.028000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.033000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.036000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.038000 0.036000
BMO Bank of Montreal Financial Services Banks 0.043000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.043000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.056000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.064000 0.026000
IGM IGM Financial Inc. Financial Services Asset Management 0.064000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.072000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.074000 0.080000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.079000 0.082000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.090000 0.016000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.115000 0.006000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.159000 0.113000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.588
                                      Predicted Expected Return   : 0.061
                                      Sharpe Ratio                : 0.039
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.020000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.024000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.033000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.039000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.041000 0.080000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.046000 0.040000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.046000 0.082000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.050000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.052000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.054000 0.036000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.059000 0.034000
BMO Bank of Montreal Financial Services Banks 0.059000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.070000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.076000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.098000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.113000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.120000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.256
                                      Predicted Expected Return   : 0.034
                                      Sharpe Ratio                : 0.027
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BMO Bank of Montreal Financial Services Banks 0.000000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.000000 0.034000
SLF Sun Life Financial Inc. Financial Services Insurance 0.005000 0.036000
CWB Canadian Western Bank Financial Services Banks 0.008000 0.037000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.011000 0.038000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.014000 0.029000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.016000 0.040000
TD Toronto-Dominion Bank Financial Services Banks 0.022000 0.026000
BN Brookfield Corporation Financial Services Asset Management 0.043000 0.050000
ATS ATS Corporation Industrials Industrial Products 0.049000 0.052000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.049000 0.016000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.076000 0.006000
IGM IGM Financial Inc. Financial Services Asset Management 0.114000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.122000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.125000 0.080000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.130000 0.082000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.215000 0.113000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.457
                                      Predicted Expected Return   : 0.06
                                      Sharpe Ratio                : 0.041
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.018000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.022000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.035000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.042000 0.079000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.044000 0.040000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.044000 0.080000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.049000 0.038000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.049000 0.082000
CWB Canadian Western Bank Financial Services Banks 0.051000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.053000 0.036000
BMO Bank of Montreal Financial Services Banks 0.058000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.058000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.069000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.075000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.097000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.117000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.119000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [63]:
plot_suggested_risk_profile_portfolio_structure( log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                                most_diversify_portfolio_assets_list, portfolio_risk=1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                              **********************************************************************
                                                    Investment Profile Simulation And Portfolio Allocation 
                                              **********************************************************************

Simulated Risk Predicted Expected Return Sharpe Ratio
Risk Profile
Aggressive 1.610688 0.060046 0.037308
Conservative 1.455506 0.058903 0.040447
                                    *****************************************************
                                      Risk Profile                : Aggressive Investment 
                                      Simulated Risk              : 1.611
                                      Predicted Expected Return   : 0.06
                                      Sharpe Ratio                : 0.037
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.018000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.022000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.035000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.042000 0.079000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.044000 0.040000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.044000 0.080000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.049000 0.038000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.049000 0.082000
CWB Canadian Western Bank Financial Services Banks 0.051000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.053000 0.036000
BMO Bank of Montreal Financial Services Banks 0.058000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.058000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.069000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.075000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.097000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.117000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.119000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Conservative Investment 
                                      Simulated Risk              : 1.456
                                      Predicted Expected Return   : 0.059
                                      Sharpe Ratio                : 0.04
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.018000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.022000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.035000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.042000 0.079000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.044000 0.040000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.044000 0.080000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.049000 0.038000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.049000 0.082000
CWB Canadian Western Bank Financial Services Banks 0.051000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.053000 0.036000
BMO Bank of Montreal Financial Services Banks 0.058000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.058000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.069000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.075000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.097000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.117000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.119000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'

K-means Clustering in Portfolio Analytics - Investment Risk Profiles Simulation¶

My main objective is to use K-means clustering to found out the investment type of risk tolerence: very conservative, conservative, moderate, aggressive and very aggressive. From the Elbow model, I will assume the optimal number of clusters is 5

In this section, I will combine K-means clustering with efficient frontier modeling to dig into the randomn generated portfolios. In order to simulate the investors risk tolerence, I will use K-means clustering and optimal portfolio modeling on top of the privious covarience matrix technic that I used along with the covarience treshold to reduce the volume of the assets. I used the covarience coefficient to filter uncorelated assets. Then I will recommend an investment stratagy that optimize return for each type for risk tolerence. I'm using the wining model 'y =0.07000 x^2 + -0.01600 x + -0.00900' to predict the portfolio expected returns. The simulated portfolio risk is combigned with the simulated the portfolio expected return and the predicted expected return to set the randomn efficient frontier data. The randomn efficient frontier data is then used as input for the K-means cluster models

In [64]:
def calculate_number_of_cluster(uncorrelated_weighted_portfolio_trails_simulation_df, n_components, ax):
    # Randomn efficient frontier data collection
    portfolio_risk_list = uncorrelated_weighted_portfolio_trails_simulation_df['σp']
    portefolio_return_list = uncorrelated_weighted_portfolio_trails_simulation_df['E_rp']
    predited_portfolio_return_list = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk_list)
    clipped_df = dataframe_clipping(portfolio_risk_list, portefolio_return_list, predited_portfolio_return_list )
    
    range_nbr_clusters = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    # Step 2: Standardize the data
    scaler = StandardScaler()
    scaled_efficient_Frontier_data = scaler.fit_transform(clipped_df)


    #Determine the Number of clusters using Within Cluster Sum of Squares(wcss)
    wcss = [] # (Within Cluster Sum of Squares:inertia)
    silhouette_average_list = []
    
    for n2 in range_nbr_clusters:
        kmeans = KMeans(n_clusters=n2, init ='k-means++', max_iter=300,  n_init=10,random_state=0 )
        
        kmeans.fit(clipped_df)
        wcss.append(kmeans.inertia_)
        cluster_labels = kmeans.fit_predict(clipped_df)
        silhouette_average_list.append(silhouette_score(clipped_df, cluster_labels))
    
    ax1 = ax.twinx()
    ax.plot(range_nbr_clusters, wcss, 'b-', marker='o')
    ax1.plot(range_nbr_clusters,silhouette_average_list, 'g-', marker='o')
    
    ax.set_xlabel('Number of Clusters')
    ax.set_ylabel('Within Cluster Sum of Squares(wcss)')
    ax1.set_ylabel('Silhouette score')
    ax.set_title('Elbow Method & Silhouette Analysis for Optimal Number of Clusters') 
    #plt.show()
    
    return clipped_df
In [65]:
def implement_k_means_clusters(clipped_df, fig, ax):
    # our main objective is to use K-means clustering to found out investment risk profile: very conservative, conservative, moderate, aggressive and very aggressive.
    # So from the Elbow model, let's assume the optimal number of clusters is 5

    kmeans = KMeans(n_clusters=5, init ='k-means++', max_iter=300, n_init=10,random_state=0 )
    #clusters = kmeans.fit_predict(pca_data)
    pred_clusters = kmeans.fit_predict(clipped_df)
    
    rand_data_point_and_cluster_df = clipped_df
    rand_data_point_and_cluster_df['cluster'] = pred_clusters
    
    
    #investment profile
    investment_profiles_index = ['Moderate', 'Conservative', 'Agressive', 'Very Aggressive', 'Very Conservative'] 
    investment_profiles_color = ['purple', 'gold', 'limegreen', 'green', 'yellow'] 
    display(rand_data_point_and_cluster_df)
    
    #plot cluster
    for i in range(len(investment_profiles_index)):
        cspl = ax.scatter(x=rand_data_point_and_cluster_df.loc[(rand_data_point_and_cluster_df['cluster'] ==i), ['σp']], 
                      y=rand_data_point_and_cluster_df.loc[(rand_data_point_and_cluster_df['cluster'] ==i), ['E_rp']], 
                      c= investment_profiles_color[i], cmap="viridis",label=investment_profiles_index[i])   
        
    # find clusters centratides
    cluster_centers_df = pd.DataFrame(kmeans.cluster_centers_)
    cluster_centers_df = cluster_centers_df.set_axis( kmeans.feature_names_in_ , axis=1)
    cluster_centers_df.index =  investment_profiles_index
    cluster_centers_df.index.names = ['Investment Profile']   
    
    #plot clusters centroid. 
    ax.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker=".", s=100, c='red', label = 'Cluster Centroids')
    
    #plot efficienr frontier model
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(clipped_df)
    xpoints,ypoints,top_sharpe_ratio_value_points = efficient_frontiere_optimal_portfolios_model_points(clipped_df,7)
    x_model_σp = np.linspace(xpoints.min(), xpoints.max(), len(clipped_df))
    y_model_E_rp_pred = model_poly_d2(x_model_σp, *popt_poly_d2)
    cspl = ax.scatter(x=x_model_σp, y=y_model_E_rp_pred, marker="*", c= y_model_E_rp_pred/x_model_σp,
                      cmap="viridis",label='Efficient Frontier:\n'+poly_d2_form)
    
    #find model predicted centroide expected return
    pred_centroide_Expr_list = []
    pred_centroide_Expr_list = model_poly_d2(cluster_centers_df['σp'], *popt_poly_d2)
   
   
    cluster_centers_df['Pred Centroide Expr'] =pred_centroide_Expr_list
    cluster_centers_df['Pred Centroide Sharpe Ratio'] =pred_centroide_Expr_list/cluster_centers_df['σp']
    display(cluster_centers_df)
    
    #plotting model centroide
    ax.scatter(cluster_centers_df['σp'], pred_centroide_Expr_list, marker=".", s=100, c='blue', label = 'Model Centroids')
    
    ax.set_title('Simulated Porfolio Clusters')
    ax.set_xlabel('Volatility(Risk)')
    ax.set_ylabel('Expected Return ')
    ax.legend(prop = { "size": 8 })  
    plt.show()
    
    # Silhouette Score to evaluate the clustering
    sil_score = silhouette_score(clipped_df, pred_clusters)
    print(f'Silhouette Score: {sil_score}')
    return cluster_centers_df
In [66]:
def plot_predicted_clusters_risk_profile_portfolio_allocation( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                              most_diversify_portfolio_assets_list, cluster_centers_df):
    
    
    print('\n                           **********************************************************************\n'+
              '                                 Investment Profile Simulation And Portfolio Allocation \n'+
              '                           **********************************************************************\n')
    for i in range(len(cluster_centers_df)):
        risk_profile = cluster_centers_df.index[i]
        portfolio_risk = cluster_centers_df['σp'][i]
        predicted_expected_return = cluster_centers_df['Pred Centroide Expr'][i]
        sharpe_ratio = cluster_centers_df['Pred Centroide Sharpe Ratio'][i]
        print('\n                                    *****************************************************\n'+
              '                                      Risk Profile                : '+risk_profile+' Investment \n'+
              '                                      Simulated Risk              : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *****************************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                          most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)  
In [67]:
def implement_investement_profile_simulation(log_returns, uncorrelated_weighted_portfolio_trails_simulation_df,
                                             most_diversify_portfolio_assets_list, n_components):
    fig, ax =plt.subplots(1,2,figsize=(21, 5))
    clipped_df = calculate_number_of_cluster(uncorelated_weighted_portfolio_trails_simulation_df,n_components, ax[0]) 
    print('\n                                 *********************************************************************************\n'+
              '                                  Investement profile simulation  - Optimal Portfolio - Efficient Frontier Model \n'+
              '                                 *********************************************************************************\n')
    cluster_centers_df = implement_k_means_clusters(clipped_df, fig, ax[1])
    plot_predicted_clusters_risk_profile_portfolio_allocation( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                              most_diversify_portfolio_assets_list, cluster_centers_df)

implement_investement_profile_simulation(log_returns, uncorelated_weighted_portfolio_trails_simulation_df, 
                                         most_diversify_portfolio_assets_list, 2)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                 *********************************************************************************
                                  Investement profile simulation  - Optimal Portfolio - Efficient Frontier Model 
                                 *********************************************************************************

σp E_rp y_E_rp_pred error y_optimal_E_rp sharpes_ratio cluster
9088 1.518008 0.036252 0.062054 0.025803 0.036252 0.023881 4
9450 1.479494 0.037468 0.061014 0.023546 0.037468 0.025325 1
5096 1.487795 0.037854 0.061326 0.023472 0.037854 0.025443 4
371 1.482975 0.037747 0.061151 0.023403 0.037747 0.025454 4
8735 1.438078 0.035996 0.058735 0.022739 0.035996 0.025030 0
... ... ... ... ... ... ... ...
4488 1.330845 0.047093 0.047244 0.000151 0.047093 0.035386 2
9034 1.445931 0.059122 0.059259 0.000138 0.059122 0.040888 1
2486 1.409447 0.056408 0.056456 0.000048 0.056408 0.040021 0
2325 1.322570 0.045990 0.046022 0.000033 0.045990 0.034773 2
2839 1.366888 0.051998 0.052006 0.000008 0.051998 0.038041 2

9924 rows × 7 columns

C:\Users\atsuv\AppData\Local\Temp\ipykernel_13792\1421094688.py:20: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

C:\Users\atsuv\AppData\Local\Temp\ipykernel_13792\1421094688.py:20: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

C:\Users\atsuv\AppData\Local\Temp\ipykernel_13792\1421094688.py:20: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

C:\Users\atsuv\AppData\Local\Temp\ipykernel_13792\1421094688.py:20: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

C:\Users\atsuv\AppData\Local\Temp\ipykernel_13792\1421094688.py:20: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

σp E_rp y_E_rp_pred error y_optimal_E_rp sharpes_ratio Pred Centroide Expr Pred Centroide Sharpe Ratio
Investment Profile
Moderate 1.422478 0.047610 0.057513 0.009903 0.047610 0.033469 0.057564 0.040468
Conservative 1.462226 0.048986 0.060165 0.011178 0.048986 0.033502 0.060210 0.041177
Agressive 1.376375 0.045750 0.052988 0.007238 0.045750 0.033237 0.053108 0.038585
Very Aggressive 1.558043 0.051773 0.061842 0.010068 0.051773 0.033231 0.062033 0.039815
Very Conservative 1.502600 0.050109 0.061699 0.011591 0.050109 0.033349 0.061763 0.041104
Silhouette Score: 0.9811219752806893

                           **********************************************************************
                                 Investment Profile Simulation And Portfolio Allocation 
                           **********************************************************************


                                    *****************************************************
                                      Risk Profile                : Moderate Investment 
                                      Simulated Risk              : 1.422
                                      Predicted Expected Return   : 0.058
                                      Sharpe Ratio                : 0.04
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.014000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.018000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.041000 0.076000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.041000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.046000 0.038000
CIX CI Financial Corp. Financial Services Asset Management 0.048000 0.079000
CWB Canadian Western Bank Financial Services Banks 0.048000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.050000 0.036000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.050000 0.080000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.055000 0.034000
BMO Bank of Montreal Financial Services Banks 0.055000 0.034000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.055000 0.082000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.066000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.073000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.096000 0.016000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.119000 0.006000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.126000 0.113000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Conservative Investment 
                                      Simulated Risk              : 1.462
                                      Predicted Expected Return   : 0.06
                                      Sharpe Ratio                : 0.041
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.018000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.022000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.035000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.042000 0.079000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.044000 0.040000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.044000 0.080000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.049000 0.038000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.049000 0.082000
CWB Canadian Western Bank Financial Services Banks 0.051000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.053000 0.036000
BMO Bank of Montreal Financial Services Banks 0.058000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.058000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.069000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.075000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.097000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.117000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.119000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Agressive Investment 
                                      Simulated Risk              : 1.376
                                      Predicted Expected Return   : 0.053
                                      Sharpe Ratio                : 0.039
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.002000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.007000 0.050000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.032000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.037000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.040000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.042000 0.036000
BMO Bank of Montreal Financial Services Banks 0.047000 0.034000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.047000 0.034000
IGM IGM Financial Inc. Financial Services Asset Management 0.057000 0.076000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.060000 0.029000
CIX CI Financial Corp. Financial Services Asset Management 0.065000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.067000 0.080000
TD Toronto-Dominion Bank Financial Services Banks 0.067000 0.026000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.072000 0.082000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.092000 0.016000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.117000 0.006000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.149000 0.113000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Very Aggressive Investment 
                                      Simulated Risk              : 1.558
                                      Predicted Expected Return   : 0.062
                                      Sharpe Ratio                : 0.04
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.021000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.026000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.030000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.036000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.039000 0.080000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.043000 0.082000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.047000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.052000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.054000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.056000 0.036000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.060000 0.034000
BMO Bank of Montreal Financial Services Banks 0.060000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.071000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.077000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.099000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.109000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.120000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Very Conservative Investment 
                                      Simulated Risk              : 1.503
                                      Predicted Expected Return   : 0.062
                                      Sharpe Ratio                : 0.041
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ATS ATS Corporation Industrials Industrial Products 0.021000 0.052000
BN Brookfield Corporation Financial Services Asset Management 0.026000 0.050000
IGM IGM Financial Inc. Financial Services Asset Management 0.030000 0.076000
CIX CI Financial Corp. Financial Services Asset Management 0.036000 0.079000
WFG West Fraser Timber Co. Ltd. Basic Materials Forest Products 0.039000 0.080000
AGI Alamos Gold Inc. Basic Materials Metals & Mining 0.043000 0.082000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.047000 0.040000
BTO B2Gold Corp. Basic Materials Metals & Mining 0.052000 0.038000
CWB Canadian Western Bank Financial Services Banks 0.054000 0.037000
SLF Sun Life Financial Inc. Financial Services Insurance 0.056000 0.036000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.060000 0.034000
BMO Bank of Montreal Financial Services Banks 0.060000 0.034000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.071000 0.029000
TD Toronto-Dominion Bank Financial Services Banks 0.077000 0.026000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.099000 0.016000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.109000 0.113000
SIL SilverCrest Metals Inc. Basic Materials Metals & Mining 0.120000 0.006000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'

Portfolio Stress Testing¶

Macroeconomics Key Performance Indicators(KPIs) Data Collection and Preprocessing¶

In this section, we will use Statistic Canads API stats-can to integrate the Canadian economic factors. We will then use Principal Components Analysis(PCA) technique to select the most importance economic factors.

In [68]:
warnings.filterwarnings("ignore")
#                      --------------------------------------------------------------------------------------------                    
#                        Trade Balance: Labour force characteristics by province, monthly, seasonally adjusted
#                      --------------------------------------------------------------------------------------------

def get_trade_balance_rate(reporting_year_period, frequency_date_column ):
        frequency = frequency_date_column[0].upper()
        df = sc.table_to_df("12-10-0011-01")    
        df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['Trade'] =='Trade Balance') & 
            (df['Principal trading partners']  == 'All countries'), ['REF_DATE','Trade','Principal trading partners','VALUE']]
        df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
        trade_balance_df = df1[[frequency_date_column, 'VALUE']]
        trade_balance_df['Trade Balance Rate'] = trade_balance_df['VALUE'].pct_change() * 100
        
        trade_balance_rate_df= trade_balance_df.groupby(frequency_date_column).mean()
        #trade_balance_rate_df = trade_balance_rate_df.rename(columns={'VALUE': 'Unemployment rate'})
        trade_balance_rate_df['Trade Balance Rate'] = round(trade_balance_rate_df['Trade Balance Rate'],1)
        trade_balance_rate_df = trade_balance_rate_df[['Trade Balance Rate']]
        trade_balance_rate_df = trade_balance_rate_df.dropna()
        return trade_balance_rate_df
    
    

#                      --------------------------------------------------------------------------------------------                    
#                        unemployment rate: Labour force characteristics by province, monthly, seasonally adjusted
#                      --------------------------------------------------------------------------------------------

def get_unemployment_rate(reporting_year_period, frequency_date_column ):
        frequency = frequency_date_column[0].upper()
        df = sc.table_to_df("14-10-0287-03")    
        df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['Labour force characteristics'] =='Unemployment rate') & 
            (df['UOM']  == 'Percentage'), ['REF_DATE','Labour force characteristics','UOM','VALUE']]
        df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
        unemployment_rate_df = df1[[frequency_date_column, 'VALUE']]
        unemployment_rate_df= unemployment_rate_df.groupby(frequency_date_column).mean()
        unemployment_rate_df = unemployment_rate_df.rename(columns={'VALUE': 'Unemployment rate'})
        unemployment_rate_df['Unemployment rate'] = round(unemployment_rate_df['Unemployment rate'],1)
        unemployment_rate_df = unemployment_rate_df.dropna()
        return unemployment_rate_df
    
    
#                      --------------------------------------------------------------------------------------------------
#                                Financial market statistics, last Wednesday unless otherwise stated, Bank of Canada
#                      --------------------------------------------------------------------------------------------------

def get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, col_value_name, rate_statement, frequency_date_column ):
    frequency = frequency_date_column[0].upper()
    
    df = sc.table_to_df("10-10-0122-01")
    df2 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['UOM']  == 'Percent') &  (df['Rates'].str.contains(rate_statement)), 
             ['REF_DATE','Rates','UOM','VALUE']]
    
    
    df2[frequency_date_column] = df2['REF_DATE'].dt.to_period(frequency)
    df2 = df2.dropna()
    goc_bonds_or_T_bill_df = df2[[frequency_date_column, 'VALUE']]
    #goc_bonds_or_T_bill_df['VALUE'] = round(goc_bonds_or_T_bill_df['VALUE'],1)
    goc_bonds_or_T_bill_df= goc_bonds_or_T_bill_df.groupby(frequency_date_column).mean()
    goc_bonds_or_T_bill_df= goc_bonds_or_T_bill_df.rename(columns={'VALUE': col_value_name})
    goc_bonds_or_T_bill_df[col_value_name] = round(goc_bonds_or_T_bill_df[col_value_name],1)
   
    return goc_bonds_or_T_bill_df  

#                   ---------------------------------------------------------------------------------------------------------------------
#                            CPI Inflaction:The CPI measures the average change over time in the prices paid by urban consumers 
#                            for a market basket of consumer goods and services, 
#                            and it's a key indicator of inflation (ING Think) (Inflation Calculator).
#                   ---------------------------------------------------------------------------------------------------------------------


def get_CPI_inflaction_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    alternative_measures = 'Measure of core inflation based on a factor model, CPI-common (year-over-year percent change)'
    df = sc.table_to_df("18-10-0256-01")
    df2 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['UOM']  == 'Percent') &  
            (df['Alternative measures'] == alternative_measures), 
           ['REF_DATE','Alternative measures','UOM','VALUE']]
    df2[frequency_date_column] = df2['REF_DATE'].dt.to_period(frequency)
    df2 = df2.dropna()
    CPI_inflaction_rate_df = df2[[frequency_date_column, 'VALUE']]
    CPI_inflaction_rate_df= CPI_inflaction_rate_df.groupby(frequency_date_column).mean()
    CPI_inflaction_rate_df= CPI_inflaction_rate_df.rename(columns={'VALUE': 'CPI Inflaction Rate'})
    CPI_inflaction_rate_df['CPI Inflaction Rate'] = round(CPI_inflaction_rate_df['CPI Inflaction Rate'],1)
    return CPI_inflaction_rate_df

#                                    -----------------------------------------------------------------------------------
                                                                   #morgage rate
#                                    -----------------------------------------------------------------------------------


def get_morgage_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("34-10-0145-01")
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period), ['REF_DATE', 'UOM','VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_morgage_rate_df = df1[[frequency_date_column, 'VALUE']]
    get_morgage_rate_df = get_morgage_rate_df.groupby(frequency_date_column).mean()
    get_morgage_rate_df = get_morgage_rate_df.rename(columns={'VALUE': 'Morgage Rate'})
    get_morgage_rate_df['Morgage Rate'] = round(get_morgage_rate_df['Morgage Rate'],1)
    get_morgage_rate_df = get_morgage_rate_df.dropna()
    return get_morgage_rate_df

#          -------------------------------------------------------------------------------------------------------------------------------------
#                                                                  prime rate
#                The prime interest rate is the percentage that U.S. commercial banks charge their most creditworthy customers for loans. 
#                Like all loan rates, the prime interest rate is derived from the federal funds' overnight rate, set by the Federal Reserve at 
#                meetings held eight times a year. The prime interest rate is the benchmark banks and other lenders
#                use when setting their interest rates for every category of loan from credit cards to car loans and mortgages.
#          -------------------------------------------------------------------------------------------------------------------------------------

def get_prime_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("10-10-0145-01")
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period), ['REF_DATE', 'UOM','VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    get_prime_rate_df = df1[[frequency_date_column, 'VALUE']]
    get_prime_rate_df.set_index(frequency_date_column, inplace=True)
    get_prime_rate_df = get_prime_rate_df.groupby(frequency_date_column).mean()
    get_prime_rate_df = get_prime_rate_df.rename(columns={'VALUE': 'Prime Rate'})
    get_prime_rate_df['Prime Rate'] = round(get_prime_rate_df['Prime Rate'],1)
    get_prime_rate_df = get_prime_rate_df.dropna()
    return get_prime_rate_df

#                       ----------------------------------------------------------------------------------------------------
#                                               House Price Index (house and land)
#                       ----------------------------------------------------------------------------------------------------

def get_house_price_index(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("18-10-0205-02")    
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['GEO'] =='Canada') & 
                 (df['New housing price indexes'] =='Total (house and land)')
                 , ['REF_DATE','New housing price indexes', 'VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_house_price_index_df = df1[[frequency_date_column, 'VALUE']]
    get_house_price_index_df.set_index(frequency_date_column, inplace=True)
    get_house_price_index_df = ((get_house_price_index_df / get_house_price_index_df.shift(1)) - 1)*100

    get_house_price_index_df= get_house_price_index_df.groupby(frequency_date_column).mean()
    get_house_price_index_df = get_house_price_index_df.rename(columns={'VALUE': 'House Price Index(house and land)'})
    get_house_price_index_df['House Price Index(house and land)'] = round(get_house_price_index_df[
        'House Price Index(house and land)'],1)
    get_house_price_index_df = get_house_price_index_df.dropna()
    return get_house_price_index_df.tail(60)

#                             -----------------------------------------------------------------------------------------
#                                                     Real GDP growth Seasonal adjustment
#                             -----------------------------------------------------------------------------------------

def get_Real_GDP_growth(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("36-10-0434-02")    
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['GEO'] =='Canada') &
                 (df['North American Industry Classification System (NAICS)'] =='All industries [T001]'),
                 ['REF_DATE','Seasonal adjustment', 'VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_Real_GDP_growth_df = df1[[frequency_date_column, 'VALUE']]
    get_Real_GDP_growth_df.set_index(frequency_date_column, inplace=True)
    
    #get_Real_GDP_growth_df= get_Real_GDP_growth_df.groupby('MONTH_YEAR').sum()
    get_Real_GDP_growth_df= get_Real_GDP_growth_df.groupby(frequency_date_column).mean()
    get_Real_GDP_growth_df = ((get_Real_GDP_growth_df / get_Real_GDP_growth_df.shift(1)) - 1)*100
    get_Real_GDP_growth_df = get_Real_GDP_growth_df.rename(columns={'VALUE': 'Real GDP growth Seasonal adjustment'})
    get_Real_GDP_growth_df['Real GDP growth Seasonal adjustment'] = round(get_Real_GDP_growth_df[
        'Real GDP growth Seasonal adjustment'],1)
    get_Real_GDP_growth_df = get_Real_GDP_growth_df.dropna()
    return get_Real_GDP_growth_df.tail(60)

#                         ------------------------------------------------------------------------------------------------------
#                                                               Marcket Valatility
                                               

#                          oronto Stock Exchange statistics1: S&P/TSX 60 VIX Index (VIXI.TS)
#                          The S&P/TSX 60 is a market-capitalization-weighted index that tracks the performance of the 60 largest
#                          companies listed on the Toronto Stock Exchange (TSX). The S&P/TSX Composite, on the other hand,
#                          is a broader index that includes all common stocks and income trust units listed on the TSX

#                          The S&P/TSX Composite provides a more comprehensive view of the Canadian stock market. 
#                          3It includes a wider range of companies, from small-cap to large-cap. This makes it a good 
#                          choice for investors who want to diversify their portfolio across different sectors and market capitalizations. 
#                          ://www.spglobal.com/spdji/en/indices/equity/sp-tsx-composite-index/#overview
#                          Toronto Stock Exchange statisticand  :S&P/TSX 60 VIX Index (VIXI.TS),
#                          S&P/TSX Venture Composite Index (^SPCDNX) and S&P/TSX Composite index (^GSPTSE)
#                          The S&P 500 index, or Standard & Poor’s 500, is a very important index that tracks
#                          the performance of the stocks of 500 large-cap companies in the U.S. The ticker symbol for the S&P 500 index is ^GSPC.
#                          The DJIA tracks the stock prices of 30 of the biggest American companies. 
#                          The S&P 500 tracks 500 large-cap American stocks. Both offer a big-picture view of the state of the 
#                          stock markets in general
#                          https://www.investopedia.com/ask/answers/difference-between-dow-jones-industrial-average-and-sp-500/#:
#                          ~:text=Key%20Takeaways,the%20stock%20markets%20in%20general.
#                        ---------------------------------------------------------------------------------------------------------------

def get_market_index_volatility(reporting_year_period, frequency_date_column, market_index_list = ['^GSPTSE', '^GSPC', '^DJI']):
    frequency = frequency_date_column[0].upper()

    start_date = reporting_year_period
    end_date = date.today()
    #index_yahoo_adj_close_price_data = yf.download(market_index_list, start_date, end_date, ['Adj Close'], period ='max')
    #market_adj_close_price_df = index_yahoo_adj_close_price_data['Adj Close']
    market_adj_close_price_df = create_adj_close_price_df(reporting_year_period, market_index_list)
    
    market_adj_close_price_log_return_df = np.log(market_adj_close_price_df/ market_adj_close_price_df.shift(1))  
    # drop columns with all NaN's
    market_adj_close_price_log_return_df = market_adj_close_price_log_return_df.dropna(axis=0)
    
    #Market volatility
        
    market_volatility_df = market_adj_close_price_log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(market_volatility_df.columns):
        market_volatility_df = market_volatility_df.rename(columns={col: 'Market '+col+' Volatility Index'})
    
    market_volatility_df = market_volatility_df.dropna(axis=0)
    
    market_volatility_df[frequency_date_column] = pd.to_datetime(market_volatility_df.index, format = '%m/%Y')
    market_volatility_df[frequency_date_column] = market_volatility_df[frequency_date_column].dt.to_period(frequency)
        
    #market_adj_close_price_log_return_frequency_df = market_volatility_df
    market_volatility_df.set_index(frequency_date_column, inplace=True)
    market_volatility_index_df = market_volatility_df.groupby(frequency_date_column).mean()
    market_volatility_index_df = round(market_volatility_index_df,1)
    market_volatility_index_df = market_volatility_index_df.dropna(axis=0)
    return market_volatility_index_df
    #if frequency == 'M' :
    #    return market_volatility_index_df.tail(60)
    #else:
    #    return market_volatility_index_df.tail(20)
    
#-------------------------------------------------------Governement of Canada Bonds average----------------------------------------------

def goc_bonds_average(reporting_year_period, frequency_date_column):
    goc_bonds_average_yield_1_3_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 1-3 year',
        'Government of Canada marketable bonds, average yield: 1-3 year', frequency_date_column)
    goc_bonds_average_yield_5_10_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 5-10 year',
        'Government of Canada marketable bonds, average yield: 5-10 year', frequency_date_column)
    goc_bonds_average_yield_3_5_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 3-5 year',
        'Government of Canada marketable bonds, average yield: 3-5 year', frequency_date_column)
    goc_bonds_average_yield_over_10_years_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 
                                                                                        'GOC Marketable Bonds Average Yield: over 10 years',
         'Government of Canada marketable bonds, average yield: over 10 years', frequency_date_column)
    
    goc_bonds_average_df = goc_bonds_average_yield_1_3_df.merge(goc_bonds_average_yield_5_10_df, 
                                                                on= frequency_date_column, how='inner') \
                                                         .merge(goc_bonds_average_yield_3_5_df, on= frequency_date_column, how='inner') \
                                                         .merge(goc_bonds_average_yield_over_10_years_df, on= frequency_date_column, how='inner') 
    return goc_bonds_average_df

#------------------------- Governement of Canada Benchmark Bonds Yield -------------------------------------------------------------------

def goc_benchmark_bonds_yield(reporting_year_period, frequency_date_column):
    goc_benchmark_bonds_yield_over_2_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 2 year',
                                            'Selected Government of Canada benchmark bond yields: 2 year' , frequency_date_column)
    goc_benchmark_bonds_yield_over_3_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 3 year',
                                                'Selected Government of Canada benchmark bond yields: 3 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_5_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 5 year',
                                    'Selected Government of Canada benchmark bond yields: 5 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_7_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 7 year',
                                    'Selected Government of Canada benchmark bond yields: 7 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_10_years_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 10 years',
                                    'Selected Government of Canada benchmark bond yields: 10 years', frequency_date_column)
    goc_benchmark_bonds_yield_over_long_term_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: long term',
                                         'Selected Government of Canada benchmark bond yields: long term', frequency_date_column)
    goc_benchmark_bonds_yield_df = \
            goc_benchmark_bonds_yield_over_2_year_df.merge(goc_benchmark_bonds_yield_over_3_year_df, 
                                                       on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_5_year_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_7_year_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_10_years_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_long_term_df, on= frequency_date_column, how='inner')  
                                
    return goc_benchmark_bonds_yield_df

 #------------------------------------------------------------Governement of Canada Treasurt Bills --------------------------------------------
def Treasury_bills(reporting_year_period, frequency_date_column):
    
    Treasury_bills_1_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 1 month',
                                'Treasury bills: 1 month', frequency_date_column)
    Treasury_bills_2_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 2 month',
                                'Treasury bills: 2 month', frequency_date_column)
    Treasury_bills_3_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 3 month',
                                 'Treasury bills: 3 month', frequency_date_column)
    Treasury_bills_6_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 6 month',
                                'Treasury bills: 6 month', frequency_date_column)
    Treasury_bills_1_year_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 1 year',
                                'Treasury bills: 1 year', frequency_date_column)
    Treasury_bills_df = Treasury_bills_1_month_df.merge(Treasury_bills_2_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_3_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_6_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_1_year_df, on=frequency_date_column, how='inner')  
    return Treasury_bills_df

#-----------------------------------------  Other Economic Factors ------------------------------------------------------------------

def other_economic_factors(reporting_year_period, frequency_date_column):
    unemployment_rate_df = get_unemployment_rate(reporting_year_period, frequency_date_column)
    CPI_inflaction_rate_df = get_CPI_inflaction_rate(reporting_year_period, frequency_date_column)
    get_morgage_rate_df = get_morgage_rate(reporting_year_period, frequency_date_column)
    get_prime_rate_df = get_prime_rate(reporting_year_period, frequency_date_column)
    get_house_price_index_df = get_house_price_index(reporting_year_period, frequency_date_column)
    get_Real_GDP_growth_df = get_Real_GDP_growth(reporting_year_period, frequency_date_column)
    market_index_volatility_df = get_market_index_volatility(reporting_year_period, frequency_date_column)
    trade_balance_rate_df = get_trade_balance_rate(reporting_year_period, frequency_date_column)
    
    other_economic_factors_df = CPI_inflaction_rate_df.merge(get_morgage_rate_df, on= frequency_date_column, how='inner') \
                                                    .merge(get_prime_rate_df, on= frequency_date_column, how='inner') \
                                                .merge(get_house_price_index_df, on= frequency_date_column, how='inner') \
                                                .merge(unemployment_rate_df, on= frequency_date_column, how='inner') \
                                                .merge(get_Real_GDP_growth_df, on= frequency_date_column, how='inner') \
                                                .merge(market_index_volatility_df, on= frequency_date_column, how='inner')  
    
    return other_economic_factors_df

#-----------------------------------------------------------All the Economic Factors -----------------------------------------
def get_economic_factors_df(reporting_year_period, reporting_frequency):
     #set reporting frequency
        
    if reporting_frequency.capitalize() == 'Month' or reporting_frequency.capitalize() == 'Quarter':
        frequency_date_column = reporting_frequency.capitalize() + '_Year'
        #frequency = reporting_frequency[0].upper()
        
        goc_bonds_average_df = goc_bonds_average(reporting_year_period, frequency_date_column)
        goc_benchmark_bonds_yield_df = goc_benchmark_bonds_yield(reporting_year_period, frequency_date_column) 
        Treasury_bills_df = Treasury_bills(reporting_year_period, frequency_date_column)
        other_economic_factors_df = other_economic_factors(reporting_year_period, frequency_date_column)
    
        economic_factors_df = goc_bonds_average_df.merge(goc_benchmark_bonds_yield_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_df, on= frequency_date_column, how='inner') \
                                            .merge(other_economic_factors_df, on= frequency_date_column, how='inner')  

        
        return economic_factors_df
    else:
        return 'The reporting frequency should be alphanbetic, Month or Qurater'
    
#-------------------------------------------------------------Macroeconomics factors Plotting---------------------------------------

def annotate_bars(ax):# this function is generated by ChatGPT
    for p in ax.patches:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy() 
        ax.annotate(f'{height:.1f}', (x + width/2, y + height/2), ha='center', va='center', fontsize=10, color='black')


def get_economic_factors_barplotting(goc_bonds_average_df, goc_benchmark_bonds_yield_df,Treasury_bills_df, other_economic_factors_df ):
    fig, axes =plt.subplots(4,1,figsize=(20, 35), constrained_layout=True)  
   
        
    bar_width = 0.7 
        
    bar0 = goc_bonds_average_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[0])
    bar0.set_title('Governement of Canada Bonds Average',color='black')
    bar0.legend(loc='best')
    annotate_bars(axes[0])
        
    bar1 = goc_benchmark_bonds_yield_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[1]) 
    bar1.set_title('Governement of Canada Benchmark Bonds Yield',color='black')
    bar1.legend(loc='best')
    annotate_bars(axes[1])
        
    bar2 = Treasury_bills_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[2]) 
    bar2.set_title("Governement of Canada Treasury Bills",color='black')
    bar2.legend(loc='best')
    annotate_bars(axes[2])
       
    bar3 = other_economic_factors_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[3]) 
    bar3.set_title('Governement of Canada Other Economic Factirs',color='black')
    bar3.legend(loc='best')
    annotate_bars(axes[3])
        
        
#----------------------------Principal Components Analysis(PCA) to select most importance economic factors ---------------------------------

def selecting_importent_economic_factors_treshold_method_PCA(df,threshold):
    
    return df[(df.abs() > threshold).any(axis=1)].index.to_list()

def setting_PCA_for_economic_factors(economic_factors_df):
    # economic indicators dataset
   # economic_factors_df = get_economic_factors_df(reporting_year_period, reporting_frequency)

    # Standardizing the data
    scaler = StandardScaler()
    scaled_data_df = scaler.fit_transform(economic_factors_df)

    # Applying PCA
    all_pca = PCA(n_components=None)  # Use all components to find the best number of important indicators
    all_principal_components = all_pca.fit_transform(scaled_data_df)

    # Explained variance
    explained_variance = all_pca.explained_variance_ratio_

    # Principal Component Loadings(coefficients)
    loadings_matrix = all_pca.components_

    # Create a DataFrame for loadings 
    loadings_matrix_df = pd.DataFrame(loadings_matrix.T, columns=[f'PC{i+1}' for i in range(loadings_matrix.shape[0])], 
                                      index=economic_factors_df.columns)
    return loadings_matrix_df, explained_variance

def get_num_components(explained_variance,cumulative_variance_treshold = 0.9):    
    # Determine the number of components explaining the cumulative varience treshold of the variance
    cumulative_variance = explained_variance.cumsum()
    return  (cumulative_variance <= cumulative_variance_treshold).sum() + 1

def select_top_components_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    return loadings_matrix_df.iloc[:, :num_components]
     
def select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    selected_components_df = loadings_matrix_df.iloc[:, :num_components]
    # Find indicators with high loadings
    return selected_components_df[(selected_components_df.abs() > threshold_for_high_loadings).any(axis=1)]

    
def plot_explained_variance_(economic_factors_df):
    
    loadings_matrix_df, explained_variance =  setting_PCA_for_economic_factors(economic_factors_df)
    
    # Print explained variance
    
    explained_variance_df = pd.DataFrame(explained_variance).T
    explained_variance_df.columns = loadings_matrix_df.columns
    display(explained_variance_df)
    
    
    # Plotting the explained variance
    plt.figure(figsize=(10, 6))
    plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.5, align='center', label='individual explained variance')
    plt.step(range(1, len(explained_variance) + 1), np.cumsum(explained_variance), where='mid', label='cumulative explained variance')
    plt.xlabel('Principal Components')
    plt.ylabel('Explained Variance Ratio')
    plt.title('Explained Variance by Principal Components')
    plt.legend(loc='best')
    plt.show()
    

def plotting_corr_matrix(economic_factors_matrix, title):
    
    g = sns.clustermap(economic_factors_matrix ,  method = 'complete', cmap   = 'RdBu', annot  = True, annot_kws = {'size': 15},figsize=(20, 15))   
    g.fig.suptitle(title, y=0.9, fontsize=12)
    g.cax.set_position([1.02, 0.2, 0.03, 0.4])  # [left, bottom, width, height]
    plt.subplots_adjust(top=0.85)
    plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90)
    plt.setp(g.ax_heatmap.get_yticklabels(), rotation=360)
    
def get_most_important_economic_factors_list(economic_factors_df,
                                             cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5):
        
        #plot_explained_variance_(reporting_year_period, reporting_frequency)
        #plot_explained_variance_(economic_factors_df)
        loadings_matrix_df, explained_variance = setting_PCA_for_economic_factors(economic_factors_df)
        #print('\nloadings_matrix_df\n')
        #display(loadings_matrix_df)
        num_components = get_num_components(explained_variance,cumulative_variance_treshold)
        top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        #print('\ntop_components_df\n')
        #display(top_components_df)
        #print('\ntop_indicators_df\n')
        top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        #display(top_indicators_df)
        most_important_economic_factors_list = selecting_importent_economic_factors_treshold_method_PCA(top_indicators_df,
                                                                                 threshold_for_highest_loadings)
        
        return most_important_economic_factors_list
    
    
def plotting_most_important_economic_factors_list(economic_factors_df,
                                             cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5):
        
        #plot_explained_variance_(reporting_year_period, reporting_frequency)
        plot_explained_variance_(economic_factors_df)
        loadings_matrix_df, explained_variance = setting_PCA_for_economic_factors(economic_factors_df)
        print('\nloadings_matrix_df\n')
        display(loadings_matrix_df)
        num_components = get_num_components(explained_variance,cumulative_variance_treshold)
        top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        print('\ntop_components_df\n')
        display(top_components_df)
        print('\ntop_indicators_df\n')
        top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        display(top_indicators_df)
        most_important_economic_factors_list = selecting_importent_economic_factors_treshold_method_PCA(top_indicators_df,
                                                                                 threshold_for_highest_loadings)
        
    

def get_most_important_economic_factors_df(economic_factors_df, most_important_economic_factors_list):
        return economic_factors_df[most_important_economic_factors_list]
        

def get_most_important_economic_factors_matrix(most_important_economic_factors_df):
    return  generate_correlation_matrix(most_important_economic_factors_df)   

def plotting_most_important_economic_factors_corr_clustermap(most_important_economic_factors_matrix):
    
        #PCA couple with covarience matrice to select most important factors
       
        plotting_corr_matrix(most_important_economic_factors_matrix,'Most Important Economic Factors Correlation Matrix Cluster Map using PCA')
        

#----------------------------------------------------------------------Main Data Setting-------------------------------------------
reporting_year_period = start_date(365*5)
reporting_frequency = 'Quarter'

cumulative_variance_treshold = 1.0
threshold_for_highest_loadings = 0.5
correlation_coefficient_treshold = 0.3

#Economic Factors Data Frames
goc_bonds_average_df = goc_bonds_average(reporting_year_period, reporting_frequency)
goc_benchmark_bonds_yield_df = goc_benchmark_bonds_yield(reporting_year_period, reporting_frequency) 
Treasury_bills_df = Treasury_bills(reporting_year_period, reporting_frequency)
other_economic_factors_df = other_economic_factors(reporting_year_period, reporting_frequency)
trade_balance_rate_df = get_trade_balance_rate(reporting_year_period, reporting_frequency)
economic_factors_df = get_economic_factors_df(reporting_year_period, reporting_frequency)

#All the economic factors correlation matrice    
economic_factors_matrix =  generate_correlation_matrix(economic_factors_df)   

#Principal Components Analysis(PCA) to select  Most Important Economic Factors
most_important_economic_factors_list = get_most_important_economic_factors_list(economic_factors_df,  cumulative_variance_treshold, 
                                                                                threshold_for_highest_loadings)
most_important_economic_factors_df = get_most_important_economic_factors_df(economic_factors_df, most_important_economic_factors_list)
most_important_economic_factors_matrix = get_most_important_economic_factors_matrix(most_important_economic_factors_df)

#-------------------------------------------------Data Visualization------------------------------------------------------------------
def print_economic_factors_data_table():
    print('\n                             **********************************************************\n'+
             '                              All the Economic Factors Data Tables\n'+
              '                         *********************************************************\n')

    display(economic_factors_df)
    get_economic_factors_barplotting(goc_bonds_average_df, goc_benchmark_bonds_yield_df,Treasury_bills_df, other_economic_factors_df ) 


def print_economic_factors_data_corr_matrix():
    print('\n                             **********************************************************\n'+
             '                              All the Economic Factors Correlation Matrix\n'+
              '                         *********************************************************\n')

    display(economic_factors_matrix)
    plotting_corr_matrix(economic_factors_matrix, 'All the economic factors')

def print_most_important_economic_factors():

    print('\n                        *****************************************************************************************\n'+
             '                               Principal Components Analysis(PCA) to select  Most Important Economic Factors \n'+
              '                          ****************************************************************************************\n')

    print('Principal Components Analysis(PCA) to select  Most Important Economic Factors \n')
    plotting_most_important_economic_factors_list(economic_factors_df, cumulative_variance_treshold, threshold_for_highest_loadings)
    print('\n most_important_economic_factors_df\n')    
    display(most_important_economic_factors_df)
    print('\n most_important_economic_factors_matrix\n')
    display(most_important_economic_factors_matrix)
    plotting_corr_matrix(most_important_economic_factors_matrix, 'Most Important Economic Factors correlation Matrix - PCA Method')
[*********************100%%**********************]  3 of 3 completed
[*********************100%%**********************]  3 of 3 completed

Data Visualization¶

In [69]:
print_economic_factors_data_table()
                             **********************************************************
                              All the Economic Factors Data Tables
                         *********************************************************

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 5-10 year GOC Marketable Bonds Average Yield: 3-5 year GOC Marketable Bonds Average Yield: over 10 years GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 5 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: 10 years GOC benchmark bond yields: long term ... Treasury bills: 2 month Treasury bills: 3 month Treasury bills: 6 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
Quarter_Year
2020Q1 1.2 1.1 1.1 1.3 1.1 1.1 1.1 1.1 1.1 1.4 ... 1.3 1.2 1.2 1.2 2.0 4.0 1.8 0.2 4.6 -2.1
2020Q2 0.3 0.5 0.4 1.0 0.3 0.3 0.4 0.4 0.5 1.1 ... 0.2 0.2 0.3 0.3 1.6 3.9 1.2 0.1 7.8 -10.6
2020Q3 0.2 0.5 0.3 1.0 0.3 0.3 0.4 0.4 0.6 1.1 ... 0.2 0.1 0.2 0.2 1.4 3.6 1.1 0.7 5.9 8.9
2020Q4 0.2 0.6 0.4 1.1 0.2 0.3 0.4 0.5 0.7 1.2 ... 0.1 0.1 0.1 0.2 1.7 3.4 1.0 0.5 5.2 2.1
2021Q1 0.2 1.0 0.5 1.7 0.2 0.3 0.7 0.9 1.2 1.8 ... 0.1 0.1 0.1 0.1 1.7 3.3 1.1 1.2 5.5 1.2
2021Q2 0.3 1.3 0.8 1.9 0.4 0.5 0.9 1.2 1.5 2.0 ... 0.1 0.1 0.2 0.2 2.4 3.3 1.2 1.3 5.1 -0.1
2021Q3 0.4 1.2 0.8 1.8 0.5 0.6 0.9 1.1 1.3 1.8 ... 0.2 0.2 0.2 0.3 2.9 3.2 1.2 0.5 4.6 1.6
2021Q4 1.0 1.5 1.3 1.9 1.0 1.1 1.4 1.5 1.6 1.9 ... 0.1 0.1 0.3 0.7 3.1 3.4 1.3 0.6 4.1 1.6
2022Q1 1.6 2.0 1.9 2.3 1.7 1.8 2.0 2.0 2.1 2.2 ... 0.3 0.4 0.9 1.4 4.0 3.6 1.6 1.1 4.3 0.8
2022Q2 2.7 2.9 2.8 3.0 2.7 2.8 2.8 2.8 2.9 2.9 ... 1.4 1.6 2.1 2.6 5.3 4.6 2.5 0.3 3.7 1.1
2022Q3 3.5 3.0 3.2 3.0 3.5 3.4 3.1 3.0 3.0 2.9 ... 2.9 3.1 3.4 3.7 5.8 5.6 3.2 0.0 3.6 0.5
2022Q4 3.9 3.2 3.5 3.3 3.9 3.7 3.3 3.1 3.2 3.2 ... 3.9 4.1 4.2 4.4 6.0 5.8 3.7 -0.1 3.5 -0.0
2023Q1 3.9 3.0 3.3 3.1 3.8 3.6 3.2 3.0 3.0 3.1 ... 4.4 4.4 4.4 4.4 5.9 5.8 3.9 -0.2 3.8 0.6
2023Q2 4.1 3.1 3.4 3.1 4.1 3.8 3.3 3.1 3.1 3.1 ... 4.6 4.6 4.7 4.7 5.3 5.8 4.0 0.0 3.7 0.2
2023Q3 4.8 3.8 4.1 3.6 4.8 4.5 4.0 3.8 3.7 3.5 ... 5.0 5.0 5.1 5.2 4.6 6.1 4.3 -0.1 3.7 -0.1
2023Q4 4.3 3.6 3.7 3.4 4.3 4.1 3.7 3.6 3.6 3.4 ... 5.0 5.0 5.0 4.8 4.0 6.4 4.3 -0.1 3.6 0.1
2024Q1 4.2 3.4 3.6 3.4 4.1 3.9 3.5 3.4 3.4 3.3 ... 5.0 5.0 4.9 4.8 3.1 6.2 4.1 0.0 4.0 0.5
2024Q2 4.3 3.7 3.8 3.6 4.2 4.1 3.7 3.7 3.7 3.6 ... 4.8 4.8 4.8 4.6 2.5 6.1 4.1 0.2 4.0 0.4

18 rows × 21 columns

In [70]:
print_economic_factors_data_corr_matrix()
                             **********************************************************
                              All the Economic Factors Correlation Matrix
                         *********************************************************

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 5-10 year GOC Marketable Bonds Average Yield: 3-5 year GOC Marketable Bonds Average Yield: over 10 years GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 5 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: 10 years GOC benchmark bond yields: long term ... Treasury bills: 2 month Treasury bills: 3 month Treasury bills: 6 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
GOC Marketable Bonds Average Yield: 1-3 year 1.000000 0.976199 0.993209 0.951231 0.999462 0.998315 0.989338 0.977915 0.967320 0.952575 ... 0.966132 0.973636 0.987148 0.996983 0.725500 0.972380 0.989560 -0.744187 -0.746375 -0.031706
GOC Marketable Bonds Average Yield: 5-10 year 0.976199 1.000000 0.992777 0.992448 0.978538 0.985369 0.995928 0.999459 0.998761 0.991043 ... 0.913679 0.925168 0.945192 0.962425 0.739110 0.920224 0.952833 -0.625150 -0.802838 0.009622
GOC Marketable Bonds Average Yield: 3-5 year 0.993209 0.992777 1.000000 0.975615 0.994331 0.997760 0.998818 0.993509 0.987007 0.973534 ... 0.936388 0.947076 0.966219 0.983328 0.756007 0.946726 0.971334 -0.693431 -0.785146 -0.017584
GOC Marketable Bonds Average Yield: over 10 years 0.951231 0.992448 0.975615 1.000000 0.954739 0.963174 0.980846 0.989952 0.995923 0.998236 ... 0.885694 0.898744 0.918627 0.937023 0.746162 0.886569 0.927391 -0.561822 -0.805263 0.029892
GOC benchmark bond yields: 2 year 0.999462 0.978538 0.994331 0.954739 1.000000 0.998947 0.990991 0.979915 0.970160 0.955397 ... 0.962590 0.970533 0.985163 0.996038 0.734083 0.969309 0.987723 -0.735073 -0.750504 -0.021169
GOC benchmark bond yields: 3 year 0.998315 0.985369 0.997760 0.963174 0.998947 1.000000 0.995408 0.986585 0.977878 0.962796 ... 0.953059 0.962051 0.978696 0.992004 0.740256 0.962145 0.982265 -0.720636 -0.765741 -0.014608
GOC benchmark bond yields: 5 year 0.989338 0.995928 0.998818 0.980846 0.990991 0.995408 1.000000 0.996909 0.991605 0.978755 ... 0.929439 0.940004 0.960019 0.977644 0.751887 0.937894 0.966160 -0.668349 -0.793680 -0.001892
GOC benchmark bond yields: 7 year 0.977915 0.999459 0.993509 0.989952 0.979915 0.986585 0.996909 1.000000 0.997936 0.988589 ... 0.915759 0.926639 0.946566 0.963724 0.735470 0.920864 0.954083 -0.626895 -0.806113 0.007730
GOC benchmark bond yields: 10 years 0.967320 0.998761 0.987007 0.995923 0.970160 0.977878 0.991605 0.997936 1.000000 0.994981 ... 0.904050 0.915834 0.935894 0.953085 0.736259 0.907991 0.943930 -0.591373 -0.811841 0.030765
GOC benchmark bond yields: long term 0.952575 0.991043 0.973534 0.998236 0.955397 0.962796 0.978755 0.988589 0.994981 1.000000 ... 0.898326 0.909881 0.926861 0.940557 0.726525 0.894927 0.935540 -0.562839 -0.794732 0.025609
Treasury bills: 1 month 0.958881 0.903818 0.927301 0.876058 0.955022 0.944663 0.919827 0.906120 0.894311 0.889460 ... 0.999360 0.997743 0.990470 0.974308 0.569049 0.978951 0.987231 -0.765780 -0.633458 -0.047291
Treasury bills: 2 month 0.966132 0.913679 0.936388 0.885694 0.962590 0.953059 0.929439 0.915759 0.904050 0.898326 ... 1.000000 0.999162 0.994025 0.980071 0.587856 0.982410 0.991599 -0.768520 -0.644867 -0.045324
Treasury bills: 3 month 0.973636 0.925168 0.947076 0.898744 0.970533 0.962051 0.940004 0.926639 0.915834 0.909881 ... 0.999162 1.000000 0.997109 0.986143 0.612922 0.986454 0.995128 -0.770248 -0.658274 -0.048666
Treasury bills: 6 month 0.987148 0.945192 0.966219 0.918627 0.985163 0.978696 0.960019 0.946566 0.935894 0.926861 ... 0.994025 0.997109 1.000000 0.995380 0.654033 0.989194 0.999187 -0.764929 -0.682239 -0.045914
Treasury bills: 1 year 0.996983 0.962425 0.983328 0.937023 0.996038 0.992004 0.977644 0.963724 0.953085 0.940557 ... 0.980071 0.986143 0.995380 1.000000 0.709570 0.982448 0.995893 -0.761639 -0.720879 -0.036478
CPI Inflaction Rate 0.725500 0.739110 0.756007 0.746162 0.734083 0.740256 0.751887 0.735470 0.736259 0.726525 ... 0.587856 0.612922 0.654033 0.709570 1.000000 0.629513 0.666817 -0.541372 -0.751794 0.017866
Morgage Rate 0.972380 0.920224 0.946726 0.886569 0.969309 0.962145 0.937894 0.920864 0.907991 0.894927 ... 0.982410 0.986454 0.989194 0.982448 0.629513 1.000000 0.987620 -0.805028 -0.617461 -0.090534
Prime Rate 0.989560 0.952833 0.971334 0.927391 0.987723 0.982265 0.966160 0.954083 0.943930 0.935540 ... 0.991599 0.995128 0.999187 0.995893 0.666817 0.987620 1.000000 -0.762774 -0.694479 -0.048387
House Price Index(house and land) -0.744187 -0.625150 -0.693431 -0.561822 -0.735073 -0.720636 -0.668349 -0.626895 -0.591373 -0.562839 ... -0.768520 -0.770248 -0.764929 -0.761639 -0.541372 -0.805028 -0.762774 1.000000 0.388159 0.275270
Unemployment rate -0.746375 -0.802838 -0.785146 -0.805263 -0.750504 -0.765741 -0.793680 -0.806113 -0.811841 -0.794732 ... -0.644867 -0.658274 -0.682239 -0.720879 -0.751794 -0.617461 -0.694479 0.388159 1.000000 -0.353492
Real GDP growth Seasonal adjustment -0.031706 0.009622 -0.017584 0.029892 -0.021169 -0.014608 -0.001892 0.007730 0.030765 0.025609 ... -0.045324 -0.048666 -0.045914 -0.036478 0.017866 -0.090534 -0.048387 0.275270 -0.353492 1.000000

21 rows × 21 columns

In [71]:
print_most_important_economic_factors()
                        *****************************************************************************************
                               Principal Components Analysis(PCA) to select  Most Important Economic Factors 
                          ****************************************************************************************

Principal Components Analysis(PCA) to select  Most Important Economic Factors 

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18
0 0.85935 0.068931 0.03383 0.024056 0.007083 0.004152 0.001816 0.000543 0.000109 0.000046 0.000031 0.000021 0.000012 0.000007 0.000005 0.000004 0.000002 4.703571e-34
loadings_matrix_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957 0.170916
GOC Marketable Bonds Average Yield: 5-10 year 0.231555 -0.080234 -0.095232 0.153528 0.031926 0.220653 -0.045993 0.043579 -0.190157 0.094682 -0.343397 -0.213668 -0.071449 0.159755 0.474708 -0.266721 0.230909 -0.350206
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250 0.146997
GOC Marketable Bonds Average Yield: over 10 years 0.227174 -0.122381 -0.136092 0.217986 -0.057406 0.134132 -0.429816 -0.114250 0.444101 -0.171831 -0.208001 0.320040 -0.023063 0.076492 0.068157 -0.254903 -0.325177 0.221735
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109 0.198754
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860 -0.036051
GOC benchmark bond yields: 5 year 0.233485 -0.051348 -0.076537 0.077274 0.002426 0.185022 0.216081 -0.095769 -0.199210 -0.309908 0.116107 0.498475 -0.034998 0.289604 -0.361245 -0.134422 0.216742 -0.104021
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951 -0.212126
GOC benchmark bond yields: 10 years 0.230046 -0.107253 -0.098440 0.189327 0.017254 0.166038 -0.119412 0.138684 -0.133025 0.001400 -0.256611 -0.100427 0.049083 0.024542 -0.254798 0.375737 0.279357 0.379159
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868 -0.173841
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419 -0.296309
Treasury bills: 2 month 0.226989 0.119869 0.237662 0.011307 0.007964 -0.309849 -0.107582 -0.145893 -0.201502 -0.078164 -0.020385 0.101861 -0.005137 -0.203259 0.208998 0.083060 0.322462 0.497798
Treasury bills: 3 month 0.228918 0.111167 0.202732 0.004675 -0.015006 -0.274253 -0.119133 -0.032772 0.104932 0.042636 -0.161700 -0.009663 0.049030 -0.144779 -0.006277 -0.375099 0.378795 -0.164712
Treasury bills: 6 month 0.231878 0.088394 0.144440 -0.000347 -0.064215 -0.175812 0.048952 -0.000216 -0.009291 0.301415 -0.265119 -0.029288 -0.099210 -0.238959 -0.209504 -0.082658 -0.459009 0.058260
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218 -0.373485
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455 -0.009281
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172 0.026488
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323 0.006737
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009 -0.002548
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119 -0.014845
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751 -0.009416
top_components_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957
GOC Marketable Bonds Average Yield: 5-10 year 0.231555 -0.080234 -0.095232 0.153528 0.031926 0.220653 -0.045993 0.043579 -0.190157 0.094682 -0.343397 -0.213668 -0.071449 0.159755 0.474708 -0.266721 0.230909
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250
GOC Marketable Bonds Average Yield: over 10 years 0.227174 -0.122381 -0.136092 0.217986 -0.057406 0.134132 -0.429816 -0.114250 0.444101 -0.171831 -0.208001 0.320040 -0.023063 0.076492 0.068157 -0.254903 -0.325177
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860
GOC benchmark bond yields: 5 year 0.233485 -0.051348 -0.076537 0.077274 0.002426 0.185022 0.216081 -0.095769 -0.199210 -0.309908 0.116107 0.498475 -0.034998 0.289604 -0.361245 -0.134422 0.216742
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951
GOC benchmark bond yields: 10 years 0.230046 -0.107253 -0.098440 0.189327 0.017254 0.166038 -0.119412 0.138684 -0.133025 0.001400 -0.256611 -0.100427 0.049083 0.024542 -0.254798 0.375737 0.279357
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419
Treasury bills: 2 month 0.226989 0.119869 0.237662 0.011307 0.007964 -0.309849 -0.107582 -0.145893 -0.201502 -0.078164 -0.020385 0.101861 -0.005137 -0.203259 0.208998 0.083060 0.322462
Treasury bills: 3 month 0.228918 0.111167 0.202732 0.004675 -0.015006 -0.274253 -0.119133 -0.032772 0.104932 0.042636 -0.161700 -0.009663 0.049030 -0.144779 -0.006277 -0.375099 0.378795
Treasury bills: 6 month 0.231878 0.088394 0.144440 -0.000347 -0.064215 -0.175812 0.048952 -0.000216 -0.009291 0.301415 -0.265119 -0.029288 -0.099210 -0.238959 -0.209504 -0.082658 -0.459009
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751
top_indicators_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751
 most_important_economic_factors_df

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 3-5 year GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: long term Treasury bills: 1 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
Quarter_Year
2020Q1 1.2 1.1 1.1 1.1 1.1 1.4 1.3 1.2 2.0 4.0 1.8 0.2 4.6 -2.1
2020Q2 0.3 0.4 0.3 0.3 0.4 1.1 0.2 0.3 1.6 3.9 1.2 0.1 7.8 -10.6
2020Q3 0.2 0.3 0.3 0.3 0.4 1.1 0.2 0.2 1.4 3.6 1.1 0.7 5.9 8.9
2020Q4 0.2 0.4 0.2 0.3 0.5 1.2 0.1 0.2 1.7 3.4 1.0 0.5 5.2 2.1
2021Q1 0.2 0.5 0.2 0.3 0.9 1.8 0.1 0.1 1.7 3.3 1.1 1.2 5.5 1.2
2021Q2 0.3 0.8 0.4 0.5 1.2 2.0 0.1 0.2 2.4 3.3 1.2 1.3 5.1 -0.1
2021Q3 0.4 0.8 0.5 0.6 1.1 1.8 0.2 0.3 2.9 3.2 1.2 0.5 4.6 1.6
2021Q4 1.0 1.3 1.0 1.1 1.5 1.9 0.1 0.7 3.1 3.4 1.3 0.6 4.1 1.6
2022Q1 1.6 1.9 1.7 1.8 2.0 2.2 0.2 1.4 4.0 3.6 1.6 1.1 4.3 0.8
2022Q2 2.7 2.8 2.7 2.8 2.8 2.9 1.1 2.6 5.3 4.6 2.5 0.3 3.7 1.1
2022Q3 3.5 3.2 3.5 3.4 3.0 2.9 2.8 3.7 5.8 5.6 3.2 0.0 3.6 0.5
2022Q4 3.9 3.5 3.9 3.7 3.1 3.2 3.9 4.4 6.0 5.8 3.7 -0.1 3.5 -0.0
2023Q1 3.9 3.3 3.8 3.6 3.0 3.1 4.3 4.4 5.9 5.8 3.9 -0.2 3.8 0.6
2023Q2 4.1 3.4 4.1 3.8 3.1 3.1 4.5 4.7 5.3 5.8 4.0 0.0 3.7 0.2
2023Q3 4.8 4.1 4.8 4.5 3.8 3.5 4.9 5.2 4.6 6.1 4.3 -0.1 3.7 -0.1
2023Q4 4.3 3.7 4.3 4.1 3.6 3.4 4.9 4.8 4.0 6.4 4.3 -0.1 3.6 0.1
2024Q1 4.2 3.6 4.1 3.9 3.4 3.3 5.0 4.8 3.1 6.2 4.1 0.0 4.0 0.5
2024Q2 4.3 3.8 4.2 4.1 3.7 3.6 4.8 4.6 2.5 6.1 4.1 0.2 4.0 0.4
 most_important_economic_factors_matrix

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 3-5 year GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: long term Treasury bills: 1 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
GOC Marketable Bonds Average Yield: 1-3 year 1.000000 0.993209 0.999462 0.998315 0.977915 0.952575 0.958881 0.996983 0.725500 0.972380 0.989560 -0.744187 -0.746375 -0.031706
GOC Marketable Bonds Average Yield: 3-5 year 0.993209 1.000000 0.994331 0.997760 0.993509 0.973534 0.927301 0.983328 0.756007 0.946726 0.971334 -0.693431 -0.785146 -0.017584
GOC benchmark bond yields: 2 year 0.999462 0.994331 1.000000 0.998947 0.979915 0.955397 0.955022 0.996038 0.734083 0.969309 0.987723 -0.735073 -0.750504 -0.021169
GOC benchmark bond yields: 3 year 0.998315 0.997760 0.998947 1.000000 0.986585 0.962796 0.944663 0.992004 0.740256 0.962145 0.982265 -0.720636 -0.765741 -0.014608
GOC benchmark bond yields: 7 year 0.977915 0.993509 0.979915 0.986585 1.000000 0.988589 0.906120 0.963724 0.735470 0.920864 0.954083 -0.626895 -0.806113 0.007730
GOC benchmark bond yields: long term 0.952575 0.973534 0.955397 0.962796 0.988589 1.000000 0.889460 0.940557 0.726525 0.894927 0.935540 -0.562839 -0.794732 0.025609
Treasury bills: 1 month 0.958881 0.927301 0.955022 0.944663 0.906120 0.889460 1.000000 0.974308 0.569049 0.978951 0.987231 -0.765780 -0.633458 -0.047291
Treasury bills: 1 year 0.996983 0.983328 0.996038 0.992004 0.963724 0.940557 0.974308 1.000000 0.709570 0.982448 0.995893 -0.761639 -0.720879 -0.036478
CPI Inflaction Rate 0.725500 0.756007 0.734083 0.740256 0.735470 0.726525 0.569049 0.709570 1.000000 0.629513 0.666817 -0.541372 -0.751794 0.017866
Morgage Rate 0.972380 0.946726 0.969309 0.962145 0.920864 0.894927 0.978951 0.982448 0.629513 1.000000 0.987620 -0.805028 -0.617461 -0.090534
Prime Rate 0.989560 0.971334 0.987723 0.982265 0.954083 0.935540 0.987231 0.995893 0.666817 0.987620 1.000000 -0.762774 -0.694479 -0.048387
House Price Index(house and land) -0.744187 -0.693431 -0.735073 -0.720636 -0.626895 -0.562839 -0.765780 -0.761639 -0.541372 -0.805028 -0.762774 1.000000 0.388159 0.275270
Unemployment rate -0.746375 -0.785146 -0.750504 -0.765741 -0.806113 -0.794732 -0.633458 -0.720879 -0.751794 -0.617461 -0.694479 0.388159 1.000000 -0.353492
Real GDP growth Seasonal adjustment -0.031706 -0.017584 -0.021169 -0.014608 0.007730 0.025609 -0.047291 -0.036478 0.017866 -0.090534 -0.048387 0.275270 -0.353492 1.000000
Scenario Analysis¶

Macroeconomics KPI Best Case Scenario - Worst Case Scenario and Normal Case Scenario

In [ ]:
 
Recalculate Portfolio Key Performence metricstrics¶

recalculate Expected return, Standard deviation (risk), and Value-at-Risk (VaR).

In [ ]:
 
Visualize the Stress Test Results¶

Visualizing the impact of the stress scenario on the portfolio can help in understanding the potential risks.

Decision Trees in Portfolio Stress Testing¶

In this section, we will use Decision Tree to model how different scenarios might cascade through the portfolio, affecting asset values, returns, and overall portfolio performance.

In [ ]:
 
Interpret the Results¶
  • Expected Return under Stress: Indicates how much the portfolio's return is expected to decrease under the stress scenario.
  • Portfolio Risk under Stress: Shows how much the risk (volatility) increases under the stress scenario.
  • VaR under Stress: Quantifies the potential loss in the portfolio's value at a specified confidence level under stressed conditions.
In [ ]:
 
In [ ]:
 
In [72]:
#historical_data_int = pd.read_csv('~/Documents/FinancialMath/SummerSeminar2016/2016-supervisory-historical-data/SupervisoryhistoricalInternational.csv', date_parser=pd.Period)
In [73]:
#historical_data_dom = pd.read_csv('~/Documents/FinancialMath/SummerSeminar2016/2016-supervisory-historical-data/SupervisoryhistoricalDomestic.csv', date_parser=pd.Period)

Refences¶

Read and print the stock tickers that make up S&P/TSX_Composite_Index¶

https://en.wikipedia.org/wiki/S%26P/TSX_Composite_Index tickersDJIA = pd.read_html( 'https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average')[0] https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average https://en.wikipedia.org/wiki/New_York_Stock_Exchange clusters https://medium.com/pursuitnotes/k-means-clustering-model-in-6-steps-with-python-35b532cfa8ad https://odsc.medium.com/unsupervised-learning-evaluating-clusters-bd47eed175ce https://medium.com/@nusfintech.ml/ml-optimisation-for-portfolio-allocation-9da34e7fe6b1

https://www.scikit-yb.org/en/latest/ https://plotly.com/python/time-series/

print(tickersDJIA.head())

Get the data for the tickers from yahoo finance¶

data = yf.download(tickersDJIA.Symbol.to_list(),'2021-1-1','2021-7-12', auto_adjust=True)['Close'] print(data.head())

  1. Mean Absolute Error (MAE)
  2. Root Mean Squared Error (RMSE)
  3. Mean Absolute Percentage Error (MAPE)
  4. R-Squared Score.
  5. https://www.math.utah.edu/~palais/pcr/spike/Evaluating%20the%20Goodness%20of%20Fit%20--%20Fitting%20Data%20(Curve%20Fitting%20Toolbox)%20copy.html
  6. To compute one standard deviation errors on the parameters, use perr = np.sqrt(np.diag(pcov))
  7. https://statisticsbyjim.com/regression/curve-fitting-linear-nonlinear-regression/
  8. Interpreting the results (coefficient, intercept) and calculating the accuracy of the model
  9. Visualization (plotting a graph)
  10. https://data36.com/linear-regression-in-python-numpy-polyfit/
  11. Importing the Python libraries we will use
  12. Getting the data
  13. Defining x values (the input variable) and y values (the output variable) 15, Machine Learning: fitting the model
  14. Interpreting the results (coefficient, intercept) and calculating the accuracy of the model
  15. machine learning is really what comes before it (data preparation, data cleaning) and what comes after it (interpreting, testing, validating and fine-tuning the model).
  16. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
  17. mean squared error (MSE), R-squared, or adjusted R-squared can be used to assess
  18. https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html https://builtin.com/data-science/step-step-explanation-principal-component-analysis

solving lineair equation¶

**solving lineair equation

confusion matrinx¶

weight optimization¶

https://www.geeksforgeeks.org/data-science-solving-linear-equations-2/?ref=ml_lbp

https://lmfit.github.io/lmfit-py/examples/example_fit_with_bounds.html http://wwwens.aero.jussieu.fr/lefrere/master/SPE/docs-python/scipy-doc/generated/scipy.optimize.curve_fit.html https://lmfit.github.io/lmfit-py/examples/example_fit_with_bounds.html https://datascience.stackexchange.com/questions/65136/get-the-polynomial-equation-with-two-variables-in-python

def func(x, a, b, c): return a np.exp(-b x) + c

xdata = np.linspace(0, 4, 50) y = func(xdata, 2.5, 1.3, 0.5) rng = np.random.default_rng() y_noise = 0.2 * rng.normal(size=xdata.size) ydata = y + y_noise plt.plot(xdata, ydata, 'b-', label='data')

popt, pcov = curve_fit(func, xdata, ydata) plt.plot(xdata, func(xdata, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

Constrain the optimization to the region of 0 <= a <= 3, 0 <= b <= 1 and 0 <= c <= 0.5:¶

popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 1., 0.5])) plt.plot(xdata, func(xdata, *popt), 'g--', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show()

------------¶

Kolmogorov-Smirnov Test¶

The Kolmogorov-Smirnov (K-S) test is a nonparametric test that can be used to evaluate whether a sample comes from a population with a specific continuous distribution.

To perform the K-S test in Python, we can use the scipy.stats.kstest function from the scipy module. from scipy.stats import kstest

Sample data¶

sample = [0.5, 0.4, 0.35, 0.3, 0.25]

Perform the K-S test¶

statistic, p_value = kstest(sample, 'norm')

print(statistic) print(p_value)

import matplotlib.pyplot as plt

Create a figure and axes¶

fig, ax = plt.subplots()

Create a scatter plot of data¶

scatter = ax.scatter(x, y)

Add a hover tooltip to the scatter plot¶

annot = ax.annotate("", xy=(0,0), xytext=(20,20), textcoords="offset points", bbox=dict(boxstyle="round", fc="w"), arrowprops=dict(arrowstyle="->")) annot.set_visible(False)

def update_annot(ind): pos = scatter.get_offsets()[ind["ind"][0]] annot.xy = pos text = f"{ind['ind']}: {pos}" annot.set_text(text)

def hover(event): vis = annot.get_visible() if event.inaxes == ax: cont, ind = scatter.contains(event) if cont: update_annot(ind) annot.set_visible(True) fig.canvas.draw_idle() else: if vis: annot.set_visible(False) fig.canvas.draw_idle()

fig.canvas.mpl_connect("motion_notify_event", hover)

plt.show()

\https://www.quora.com/How-can-you-display-a-tooltip-with-additional-information-about-the-bounding-box-while-it-is-being-hovered-over-in-Python

In [ ]:

In [ ]: